awesome-scalability/README.md

# High Scalability, High Availability, and High Stability Design Patterns in Back-end Systems

An updated and curated list of selected readings to illustrate High Scalability, High Availability, and High Stability Design Patterns in Back-end. Concepts are explained in the articles of notable engineers (Werner Vogels, James Hamilton, Jeff Atwood, Martin Fowler, Robert C. Martin, Tom White, Martin Kleppmann) and high quality reference sources (highscalability.com, infoq.com, official engineering blogs, etc). Case studies are taken from battle-tested systems those are serving millions to billions of users (Netflix, Alibaba, Flipkart, LINE, Spotify, etc).

#### What if your Back-end went slow?
> Understand your problems: performance problem (slow for a single user) or scalability problem (fast for a single user but slow under heavy load) by reviewing [design principles](#principles). You can also check some [talks](#talks) of elite engineers from tech giants (Google, Facebook, Instagram, etc) to see how they build and scale their systems.

#### What if your Back-end went down?
> "Even if you lose all one day, you can build all over again if you retain your calm!" - Thuan Pham, CTO at Uber Technologies Inc.

#### Community Power

> Contributions are greatly welcome! You may want to take a look at the [contribution guidelines](CONTRIBUTING.md).
> If you find this project helpful, [please help me share it on Twitter!](https://ctt.ec/V8B2p) Thank you very much :heart: 

## Contents
- [Principles](#principles)
- [Scalability](#scalability)
- [Availability](#availability)
- [Stability](#stability)
- [Performance](#performance)
- [Other Aspects](#others)
- [Talks](#talks)
- [Books](#books)

## Principles
* [My Scaling Hero - Jeff Atwood](https://blog.codinghorror.com/my-scaling-hero/)
* [Principles of Chaos Engineering](https://www.usenix.org/conference/srecon17americas/program/presentation/rosenthal)
* [Finding the Order in Chaos](https://www.usenix.org/conference/srecon16/program/presentation/lueder)
* [The Clean Architecture - Robert C. Martin (Uncle Bob)](https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-architecture.html)
* [The Twelve-Factor App](https://12factor.net/)
* [10 Common (Large-Scale) Software Architectural Patterns in a Nutshell](https://towardsdatascience.com/10-common-software-architectural-patterns-in-a-nutshell-a0b47a1e9013)
* [CAP Theorem and Trade-offs](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
* [CAP Twelve Years Later: How the "Rules" Have Changed (2012) - Eric Brewer, VP of Infrastructure at Google](https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed)	
* [Scale Up or Scale Out, What it is and Why You Should Care](https://www.brianjgraf.com/2013/05/17/scalability-scale-up-scale-out-care/)
* [Scaling Up vs Scaling Out: Hidden Costs](https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/)
* [ACID and BASE](https://neo4j.com/blog/acid-vs-base-consistency-models-explained/)
* [Blocking/Non-Blocking and Sync/Async](https://blogs.msdn.microsoft.com/csliu/2009/08/27/io-concept-blockingnon-blocking-vs-syncasync/)
* [Why Non-Blocking?](https://techblog.bozho.net/why-non-blocking/)
* [Performance and Scalability of Databases](https://use-the-index-luke.com/sql/testing-scalability)
* [Database Isolation Levels and Effects on Performance and Scalability](http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html)
* [SQL versus NoSQL](https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/)
* [Practical NoSQL resilience design pattern for the enterprise (eBay)](https://www.ebayinc.com/stories/blogs/tech/practical-nosql-resilience-design-pattern-for-the-enterprise/)
* [SQL or NoSQL - Lesson Learned from Salesforce](https://engineering.salesforce.com/sql-or-nosql-9eaf1d92545b)
* [How Sharding Works](https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6)
* [Consistent Hashing - Tom White, author of 'Hadoop: the Definitive Guide'](http://www.tom-e-white.com/2007/11/consistent-hashing.html)
* [Uniform Consistent Hashing](https://medium.com/netflix-techblog/distributing-content-to-open-connect-3e3e391d4dc9)
* [Eventually Consistent - Werner Vogels, CTO at Amazon](https://www.allthingsdistributed.com/2008/12/eventually_consistent.html)
* [Cache is King!](https://www.stevesouders.com/blog/2012/10/11/cache-is-king/)
* [Anti-Caching](http://the-paper-trail.org/blog/paper-notes-anti-caching/)
* [Understand Latency](http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it)
* [Latency Numbers Every Programmer Should Know](http://norvig.com/21-days.html#answers)
* [Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO](http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html)	
* [20 Common Bottlenecks](http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html)
* [Life Beyond Distributed Transactions](https://queue.acm.org/detail.cfm?id=3025012)
* [Relying on Software to Redirect Traffic Reliably at Various Layers](https://www.usenix.org/conference/srecon15/program/presentation/taveira)
* [Advantages and Drawbacks of Microservices](https://cloudacademy.com/blog/microservices-architecture-challenge-advantage-drawback/)
* [Microservices Scale Cube](http://microservices.io/articles/scalecube.html)
* [Breaking Things on Purpose](https://www.usenix.org/conference/srecon17americas/program/presentation/andrus)
* [Avoid Over Engineering](https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8)
* [Scalability Worst Practices](https://www.infoq.com/articles/scalability-worst-practices)
* [Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple!](https://medium.com/@DataStax/instagram-engineerings-3-rules-to-a-scalable-cloud-application-architecture-c44afed31406)
* [Why Over-Reusing is Bad](http://tech.transferwise.com/why-over-reusing-is-bad/)
* [Performance is a Feature](https://blog.codinghorror.com/performance-is-a-feature/)
* [Make Performance Part of Your Workflow](https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/)
* [The Benefits of Server Side Rendering Over Client Side Rendering](https://medium.com/walmartlabs/the-benefits-of-server-side-rendering-over-client-side-rendering-5d07ff2cefe8)
* [Writing Code that Scales](https://blog.rackspace.com/writing-code-that-scales)
* [Automate and Abstract: Lessons from Facebook on Engineering for Scale](https://architecht.io/lessons-from-facebook-on-engineering-for-scale-f5716f0afc7a)
* [AWS Do's and Don'ts](https://8thlight.com/blog/sarah-sunday/2017/09/15/aws-dos-and-donts.html)
* [(UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify](https://medium.com/@hellostanley/design-doesnt-scale-4d81e12cbc3e)
* [Design for Loose-coupling](http://bulgerpartners.com/how-loosely-coupled-architectures-are-helping-the-modernization-of-legacy-software/)
* [Design for Resiliency](http://highscalability.com/blog/2012/12/31/designing-for-resiliency-will-be-so-2013.html)
* [Design for Self-healing](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/self-healing)
* [Design for Scaling Out](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out)
* [Best Practices for Scaling Out](https://blog.openshift.com/best-practices-for-horizontal-application-scaling/)	
* [Design for Evolution](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution)	
* [Learn from Mistakes](http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html)
* [Linux Performance](http://www.brendangregg.com/linuxperf.html)
* [How To Design A Good API and Why it Matters - Joshua Bloch](https://www.infoq.com/presentations/effective-api-design)
* [Talks/Papers on Efficiency, Reliability, Scaling - James Hamilton, VP and Distinguished Engineer at AWS](http://mvdirona.com/jrh/work/)

## Scalability
* [Microservices and Orchestration](https://hackernoon.com/microservices-are-hard-an-invaluable-guide-to-microservices-2d06bd7bcf5d)
	* [Microservices Resource Guide - Martin Fowler, Chief Scientist at ThoughtWorks](https://martinfowler.com/microservices/)
	* [Microservices Patterns](http://microservices.io/patterns/)
	* [Thinking Inside the Container (8 parts) at Riot Games](https://engineering.riotgames.com/news/thinking-inside-container)
	* [Containerization at Pinterest](https://medium.com/@Pinterest_Engineering/containerization-at-pinterest-92295347f2f3)
	* [Techniques for Splitting Up a Codebase into Microservices and Artifacts at LinkedIn](https://engineering.linkedin.com/blog/2016/02/q-a-with-jim-brikman--splitting-up-a-codebase-into-microservices)
	* [The Evolution of Container Usage at Netflix](https://medium.com/netflix-techblog/the-evolution-of-container-usage-at-netflix-3abfc096781b)
	* [Dockerizing MySQL at Uber](https://eng.uber.com/dockerizing-mysql/)
	* [Testing of Microservices at Spotify](https://labs.spotify.com/2018/01/11/testing-of-microservices/)
	* [Organize Monolith Before Breaking it into Services at Weebly](https://medium.com/weebly-engineering/how-to-organize-your-monolith-before-breaking-it-into-services-69cbdb9248b0)
	* [Lessons learned running Docker in production at Treehouse](https://medium.com/treehouse-engineering/lessons-learned-running-docker-in-production-5dce99ece770)
	* [Inside a SoundCloud Microservice](https://developers.soundcloud.com/blog/inside-a-soundcloud-microservice)
	* [Microservices at BlaBlaCar](http://blablatech.com/blog/micro-service-at-blablacar)
	* [Operate Kubernetes Reliably at Stripe](https://stripe.com/blog/operating-kubernetes)
	* [Kubernetes Traffic Routing (2 parts) at Rakuten](https://techblog.rakuten.co.jp/2017/09/28/k8s-routing2/)
	* [Agrarian-Scale Kubernetes (3 parts) at New York Times](https://open.nytimes.com/agrarian-scale-kubernetes-part-3-ee459887ed7e)
	* [Mesos, Docker and Ochopod in Localization Services at Autodesk](http://cloudengineering.autodesk.com/blog/2015/11/mesos-docker-and-ochopod-in-autodesk-localization-services.html)
	* [Nanoservices at BBC Online](https://medium.com/bbc-design-engineering/powering-bbc-online-with-nanoservices-727840ba015b)
	* [PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg](https://www.techatbloomberg.com/blog/powerfulseal-testing-tool-kubernetes-clusters/)
	* [Conductor: Microservices Orchestrator at Netflix](https://medium.com/netflix-techblog/netflix-conductor-a-microservices-orchestrator-2e8d4771bf40)
	* [Making 10x Improvement in Release Times with Docker and Amazon ECS at Nextdoor](https://engblog.nextdoor.com/how-nextdoor-made-a-10x-improvement-in-release-times-with-docker-and-amazon-ecs-35aab52b726f)
* [Distributed Caching](https://www.wix.engineering/single-post/scaling-to-100m-to-cache-or-not-to-cache)
	* [Write-behind and Write-through](https://docs.oracle.com/cd/E15357_01/coh.360/e15723/cache_rtwtwbra.htm#COHDG5177)
	* [Eviction Policies](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html)
	* [Peer-To-Peer Caching](https://en.wikipedia.org/wiki/P2P_caching)
	* [EVCache: Caching for a Global Netflix](https://medium.com/netflix-techblog/caching-for-a-global-netflix-7bcc457012f1)
	* [Memsniff: Robust Memcache Traffic Analyzer at Box.com](https://blog.box.com/blog/introducing-memsniff-robust-memcache-traffic-analyzer/)
	* [Caching with Consistent Hashing and Cache Smearing at Etsy](https://codeascraft.com/2017/11/30/how-etsy-caches/)
	* [An Analysis of Facebook Photo Caching](https://code.facebook.com/posts/220956754772273/an-analysis-of-facebook-photo-caching/)
	* [Reduce Memcached Memory Usage by 50% at Trivago](http://tech.trivago.com/2017/12/19/how-trivago-reduced-memcached-memory-usage-by-50/)
* [Distributed Tracking and Tracing](https://www.oreilly.com/ideas/understanding-the-value-of-distributed-tracing)
	* [Tracking Service Infrastructure at Scale at Shopify](https://www.usenix.org/conference/srecon17americas/program/presentation/arthorne)
	* [Distributed Tracing with Pintrace at Pinterest](https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b)
	* [Analyzing Distributed Trace Data at Pinterest](https://medium.com/@Pinterest_Engineering/analyzing-distributed-trace-data-6aae58919949)
	* [Distributed Tracing at Uber](https://eng.uber.com/distributed-tracing/)
	* [Data Checking at Dropbox](https://www.usenix.org/conference/srecon17asia/program/presentation/mah)
	* [Tracing distributed systems at Showmax](https://tech.showmax.com/2016/10/tracing-distributed-systems-at-showmax/)
* [Distributed Logging](https://blog.treasuredata.com/blog/2016/08/03/distributed-logging-architecture-in-the-container-era/)
	* [The Problem with Logging - Jeff Atwood](https://blog.codinghorror.com/the-problem-with-logging/)
	* [The Log: What Every Software Engineer Should Know](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying)
	* [Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann](https://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/)
	* [Scalable and reliable log ingestion at Pinterest](https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
	* [Building DistributedLog at Twitter: High-performance replicated log service](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-log-servic.html)
	* [Logging Service with Spark at CERN Accelerator](https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html)
	* [Logging and Aggregation at Quora](https://engineering.quora.com/Logging-and-Aggregation-at-Quora)
	* [BookKeeper: Distributed Log Storage at Yahoo](https://yahooeng.tumblr.com/post/109908973316/bookkeeper-yahoos-distributed-log-storage-is)
	* [LogDevice: Distributed Data Store for Logs at Facebook](https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/)
* [Distributed Messaging](https://arxiv.org/pdf/1704.00411.pdf)
	* [When to use RabbitMQ or Kafka](https://content.pivotal.io/blog/understanding-when-to-use-rabbitmq-or-apache-kafka)
	* [Should You Put Several Event Types in the Same Kafka Topic? - Martin Kleppmann](https://www.confluent.io/blog/put-several-event-types-kafka-topic/)
	* [Kafka at Scale at Linkedin](https://engineering.linkedin.com/kafka/running-kafka-scale)
	* [Delaying Asynchronous Message Processing with RabbitMQ at Indeed](http://engineering.indeedblog.com/blog/2017/06/delaying-messages/)
	* [Real-time Data Pipeline with Kafka at Yelp](https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html)
	* [Audit Kafka End-to-End at Uber (count each message exactly once, audit a message across tiers)](https://eng.uber.com/chaperone/)
	* [Kafka for PaaS at Rakuten](https://techblog.rakuten.co.jp/2016/01/28/rakuten-paas-kafka/)
	* [Deduplication Techniques](https://en.wikipedia.org/wiki/Data_deduplication)
		* [Exactly-once Semantics are Possible: Here’s How Kafka Does it](https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/)
		* [Real-time Deduping at Scale with Kafka-based Pipleline at Tapjoy](http://eng.tapjoy.com/blog-list/real-time-deduping-at-scale)
		* [Delivering Billions of Messages Exactly Once: Deduping at Segment](https://segment.com/blog/exactly-once-delivery/)		
* [Distributed Searching](http://nwds.cs.washington.edu/files/nwds/pdf/Distributed-WR.pdf)
	* [Search Architecture of Instagram](https://engineering.instagram.com/search-architecture-eeb34a936d3a)
	* [Search Architecture of eBay](http://www.cs.otago.ac.nz/homepages/andrew/papers/2017-8.pdf)
	* [Improving Search Engine Efficiency by over 25% at eBay](https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/)
	* [Elasticsearch Performance Tuning Practice at eBay](https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/)
	* [Nautilus: Travel Search Engine of Expedia](http://blog.expedia.com/expedias-nautilus-travel-search-engine-overview-and-applications/)
	* [Galene: Search Architecture of LinkedIn](https://engineering.linkedin.com/search/did-you-mean-galene)
	* [Search at Slack](https://slack.engineering/search-at-slack-431f8c80619e)
	* [Search Service (Half a Trillion Documents and Query Average Latency < 100ms) at Twitter (2014)](https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html)
	* [Manas: High Performing Customized Search System at Pinterest](https://medium.com/@Pinterest_Engineering/manas-a-high-performing-customized-search-system-cf189f6ca40f)
	* [Sherlock: Near Real Time Search Indexing at Flipkart](https://tech.flipkart.com/sherlock-near-real-time-search-indexing-95519783859d)
	* [Nebula: Storage Platform to Build Search Backends at Airbnb](https://medium.com/airbnb-engineering/nebula-as-a-storage-platform-to-build-airbnbs-search-backends-ecc577b05f06)
	* [Elasticsearch at Kickstarter](https://kickstarter.engineering/elasticsearch-at-kickstarter-db3c487887fc)
* [Distributed Storage](http://highscalability.com/blog/2011/11/1/finding-the-right-data-solution-for-your-application-in-the.html)
	* [In-memory Storage](https://medium.com/@denisanikin/what-an-in-memory-database-is-and-how-it-persists-data-efficiently-f43868cff4c1)
		* [Introduction to In-memory Data - Viktor Gamov, Solutions Architect at Hazelcast](https://www.infoq.com/presentations/in-memory-data)
		* [Optimizing Memcached Efficiency at Quora](https://engineering.quora.com/Optimizing-Memcached-Efficiency)
		* [Real-Time Data Warehouse with MemSQL on Cisco UCS](https://blogs.cisco.com/datacenter/memsql)
		* [Moving to MemSQL at Tapjoy: Horizontally Scalable, ACID Compliant, MySQL Compatibility](http://eng.tapjoy.com/blog-list/moving-to-memsql)
	* [Durable Storage (Amazon S3)](http://www.datacenterknowledge.com/archives/2013/10/04/object-storage-the-future-of-scale-out)
		* [Reasons for Choosing S3 over HDFS at Databricks](https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html)
		* [S3 in the Data Infrastructure at Airbnb](https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c)
		* [Quantcast File System on Amazon S3](https://www.quantcast.com/blog/quantcast-file-system-on-amazon-s3/)
		* [Using S3 in Netflix Chukwa](https://medium.com/netflix-techblog/evolution-of-the-netflix-data-pipeline-da246ca36905)	
		* [Yahoo Cloud Object Store - Object Storage at Exabyte Scale](https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at)
		* [Ambry: Distributed Immutable Object Store at LinkedIn](https://www.usenix.org/conference/srecon17americas/program/presentation/shenoy)
		* [Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb](https://medium.com/airbnb-engineering/hammerspace-persistent-concurrent-off-heap-storage-3db39bb04472)
* [Distributed Version Control](https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/)
	* [Distributed Version Control Systems: A Not-So-Quick Guide Through](https://www.infoq.com/articles/dvcs-guide)
	* [Distributed Git Server at Palantir](https://medium.com/@palantir/stemma-distributed-git-server-70afbca0fc29)
	* [Git Configuration Files (Dotfiles) Distribution at Booking.com](https://blog.booking.com/dotfiles-distribution-at-booking.com.html)
	* [Configuration Management for Distributed Systems (using GitHub and cfg4j) at Flickr](https://code.flickr.net/2016/03/24/configuration-management-for-distributed-systems-using-github-and-cfg4j/)
	* [Git Repo at Microsoft](https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/)		
* [NoSQL](https://www.thoughtworks.com/insights/blog/nosql-databases-overview)
	* [Key-Value Databases (DynamoDB, Voldemort, Manhattan)](http://highscalability.com/anti-rdbms-list-distributed-key-value-stores)
		* [Scaling Mapbox infrastructure with DynamoDB Streams](https://blog.mapbox.com/scaling-mapbox-infrastructure-with-dynamodb-streams-d53eabc5e972)
		* [Manhattan: Twitter’s distributed key-value database](https://blog.twitter.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale.html)
		* [Sherpa: Yahoo’s distributed NoSQL key-value store](https://yahooeng.tumblr.com/post/120730204806/sherpa-scales-new-heights)
		* [Riak inside Chat Service Architecture at Riot Games](https://engineering.riotgames.com/news/chat-service-architecture-persistence)
		* [MPH: Fast and Compact Immutable Key-Value Stores at Indeed](http://engineering.indeedblog.com/blog/2018/02/indeed-mph/)
		* [zBase: High Performance, Elastic, Distributed Key-Value Store at Zynga](https://www.zynga.com/blogs/engineering/zbase-high-performance-elastic-distributed-key-value-store-2)
	* [Column Databases (Cassandra, HBase, Vertica, Sybase IQ)](https://aws.amazon.com/nosql/columnar/)
		* [Consistent Hashing in Cassandra](https://blog.imaginea.com/consistent-hashing-in-cassandra/)
		* [When NOT to use Cassandra?](https://stackoverflow.com/questions/2634955/when-not-to-use-cassandra)
		* [Avoid Pitfalls in Scaling Cassandra Cluster: Lessons and Remedies at Walmart Labs](https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04)
		* [Storing Images in Cassandra at Walmart Scale](https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593)
		* [Cassandra at Instagram](https://www.slideshare.net/DataStax/cassandra-at-instagram-2016)
		* [How Yelp Scaled Ad Analytics with Cassandra](https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html)
		* [How Discord Stores Billions of Messages with Cassandra](https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7)
		* [Scale to serve 100+ million reads/writes using Spark and Cassandra at Dream11](https://medium.com/dream11-tech-blog/leaderboard-dream11-4efc6f93c23e)
		* [Imgur Notification: From MySQL to HBASE at Imgur](https://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/)
		* [Moving Food Feed from Redis to Cassandra at Zomato](https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra)
		* [Benchmarking Cassandra Scalability on AWS at Netflix](https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e)
	* [Document Databases (MongoDB, SimpleDB, CouchDB)](https://msdn.microsoft.com/en-us/magazine/hh547103.aspx)
		* [eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB](https://www.mongodb.com/blog/post/ebay-building-mission-critical-multi-data-center-applications-with-mongodb)
		* [MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards](https://www.mongodb.com/blog/post/mongodb-at-baidu-powering-100-apps-across-600-nodes-at-pb-scale)
		* [The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)](https://medium.baqend.com/parse-is-gone-a-few-secrets-about-their-infrastructure-91b3ab2fcf71)
		* [Migrating Mountains of Mongo Data at Addepar](https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
		* [Couchbase Ecosystem at LinkedIn](https://engineering.linkedin.com/blog/2017/12/couchbase-ecosystem-at-linkedin)
		* [SimpleDB at Zendesk](https://medium.com/zendesk-engineering/resurrecting-amazon-simpledb-9404034ec506)
	* [Graph Databases](https://www.ibm.com/developerworks/library/cl-graph-database-1/index.html)
		* [Handling Billions of Edges in a Graph Database](https://www.infoq.com/presentations/graph-database-scalability)		
		* [Neo4j case studies with Walmart, eBay, AirBnB, NASA, etc](https://neo4j.com/customers/)
		* [FlockDB: Distributed Graph Database for Storing Adjancency Lists at Twitter](https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html)
		* [JanusGraph: Scalable Graph Database backed by Google, IBM and Hortonworks](https://architecht.io/google-ibm-back-new-open-source-graph-database-project-janusgraph-1d74fb78db6b)
		* [Amazon Neptune](https://aws.amazon.com/neptune/)
	* [Datastructure Databases (Redis, Hazelcast)](https://db-engines.com/en/system/Hazelcast%3BMemcached%3BRedis)
		* [Using Redis To Scale at Twitter](http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html)
		* [Scaling Job Queue with Redis at Slack](https://slack.engineering/scaling-slacks-job-queue-687222e9d100)
		* [Moving persistent data out of Redis at Github](https://githubengineering.com/moving-persistent-data-out-of-redis/)
		* [Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram](https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c)
		* [Redis in Chat Architecture of Twitch (from 27:22)](https://www.infoq.com/presentations/twitch-pokemon)
		* [Learn Redis the hard way (in production) at Trivago](http://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/)
		* [Optimizing Session Key Storage in Redis at Deliveroo](https://deliveroo.engineering/2016/10/07/optimising-session-key-storage.html)
		* [Optimizing Redis Storage at Deliveroo](https://deliveroo.engineering/2017/01/19/optimising-membership-queries.html)		
* [RDBMS (MySQL, MSSQL, PostgreSQL)](https://www.mysql.com/products/cluster/scalability.html)
	* [MS SQL versus MySQL](https://www.upwork.com/hiring/data/sql-vs-mysql-which-relational-database-is-right-for-you/)
	* [SQL Database Performance Tuning](https://www.toptal.com/sql-server/sql-database-tuning-for-developers)
	* [Scaling Distributed Joins](http://blog.memsql.com/scaling-distributed-joins/)
	* [Why SQL is beating NoSQL, and what this means for the future of data](https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a)
	* [MySQL Crash-Safe Replication, Parallel Replication, and Slave Scaling (10 parts) at Booking.com](https://blog.booking.com/author/jean-francois-gagne.html)
	* [Sharding MySQL at Pinterest](https://medium.com/@Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f)
	* [Sharding MySQL at MailChimp](https://devs.mailchimp.com/blog/using-shards-to-accommodate-millions-of-users/)
	* [How Airbnb Partitioned Main MySQL Database in Two Weeks](https://medium.com/airbnb-engineering/how-we-partitioned-airbnb-s-main-database-in-two-weeks-55f7e006ff21)
	* [Replication is the Key for Scalability & High Availability](http://basho.com/posts/technical/replication-is-the-key-for-scalability-high-availability/)
	* [How Twitch uses PostgreSQL](https://blog.twitch.tv/how-twitch-uses-postgresql-c34aa9e56f58)
	* [Scaling MySQL-based financial reporting system at Airbnb](https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040)
	* [Scaling to 100M at Wix: MySQL is a Better NoSQL](https://www.wix.engineering/single-post/scaling-to-100m-mysql-is-a-better-nosql)
	* [Why Uber Engineering Switched from Postgres to MySQL](https://eng.uber.com/mysql-migration/)
	* [Handling Growth with Postgres at Instagram](https://engineering.instagram.com/handling-growth-with-postgres-5-tips-from-instagram-d5d7e7ffdfcb)
	* [Scaling the Analytics Database (Postgres) at TransferWise](http://tech.transferwise.com/scaling-our-analytics-database/)
	* [MySQL Sharding (3 parts) at Evernote](https://blog.evernote.com/tech/2015/10/08/the-great-shard-migration-part-ii/)
* [Time Series Database (TSDB)](https://www.influxdata.com/time-series-database/)
	* [Time Series Data: Why and How to Use a Relational Database instead of NoSQL](https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c)
	* [Beringei: High-performance Time Series Storage Engine at Facebook](https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/)	
	* [Atlas: In-memory Dimensional Time Series Database at Netflix](https://medium.com/netflix-techblog/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a)
	* [Heroic: Time Series Database at Spotify](https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/)
	* [Roshi: Distributed Storage System for Time-Series Event at SoundCloud](https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-timestamped-events)
	* [Building a Scalable Time Series Database on PostgreSQL](https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2)
	* [Scaling Time Series Data Storage at Netflix](https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-i-ec2b6d44ba39)
* [HTTP Caching (Reverse Proxy, CDN)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)
	* [Reverse Proxy (Nginx, Varnish, Squid, rack-cache)](https://www.mertech.com/overview-reverse-proxying/)
	* [Stop Worrying and Love the Proxy](https://blog.turbinelabs.io/how-we-learned-to-stop-worrying-and-love-the-proxy-89af98fabaf8)
	* [Playing HTTP Tricks with Nginx](https://www.elastic.co/blog/playing-http-tricks-nginx)
	* [Using CDN to Improve Site Performance at Coursera](https://building.coursera.org/blog/2015/07/09/improving-coursera-global-site-performance-a-head-to-head-cdn-battle-with-production-traffic/)
	* [Strategy: Caching 404s Saved 66% On Server Time at The Onion](http://highscalability.com/blog/2010/3/26/strategy-caching-404s-saved-the-onion-66-on-server-time.html)
	* [Increasing Application Performance with HTTP Cache Headers](https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers)
	* [Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga](https://www.zynga.com/blogs/engineering/zynga-geo-proxy-reducing-mobile-game-latency)
	* [Google AMP at Condé Nast](https://technology.condenast.com/story/the-why-and-how-of-google-amp-at-conde-nast)
	* [Running A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo](https://deliveroo.engineering/2016/09/19/ab-testing-cdns.html)
	* [HAProxy with Kubernetes for User-facing Traffic at SoundCloud](https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic)
* [Load Balancing](https://blog.vivekpanyam.com/scaling-a-web-service-load-balancing/)
	* [Introduction to Modern Network Load Balancing and Proxying](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236)
	* [Load Balancing infrastructure to support more than 1.3 billion users at Facebook](https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)
	* [DHCPLB: Open Source Load Balancer for DHCP at Facebook](https://code.facebook.com/posts/1734309626831603/dhcplb-an-open-source-load-balancer/)
	* [Load Balancing with Eureka at Netflix](https://medium.com/netflix-techblog/netflix-shares-cloud-load-balancing-and-failover-tool-eureka-c10647ef95e5)
	* [Load Balancing at Yelp](https://engineeringblog.yelp.com/2017/05/taking-zero-downtime-load-balancing-even-further.html)
	* [Load Balancing at Github](https://githubengineering.com/introducing-glb/)
	* [Consistent Hashing to Improve Load Balancing at Vimeo](https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed)
	* [UDP Load Balancing at 500 pixel](https://developers.500px.com/udp-load-balancing-with-keepalived-167382d7ad08)	
* [Autoscaling](https://medium.com/@BotmetricHQ/top-11-hard-won-lessons-learned-about-aws-auto-scaling-5bfe56da755f)
	* [A Horror Movie Featuring Auto Scaling Groups, EBS Volumes, Terraform, and Bash](https://blog.gruntwork.io/yak-shaving-series-1-all-i-need-is-a-little-bit-of-disk-space-6e5ef1644f67)
	* [Autoscaling Pinterest](https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)
	* [Autoscaling Based on Request Queuing at Square](https://medium.com/square-corner-blog/autoscaling-based-on-request-queuing-c4c0f57f860f)
	* [Autoscaling Applications at PayPal](https://www.paypal-engineering.com/2017/08/16/autoscaling-applications-paypal/)
	* [Autoscaling Jenkins at Trivago](http://tech.trivago.com/2017/02/17/your-definite-guide-for-autoscaling-jenkins/)
	* [Scryer: Predictive Auto Scaling Engine at Netflix](https://medium.com/netflix-techblog/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270)
* [Concurrency](http://joeduffyblog.com/2016/11/30/15-years-of-concurrency/)
	* [Message-Passing Concurrency](https://link.springer.com/chapter/10.1007/978-3-642-35170-9_11)
	* [Software Transactional Memory](https://dl.acm.org/citation.cfm?id=3037750)
	* [Dataflow Concurrency](http://www.marketwired.com/press-release/java-concurrency-and-scalability-platform-akka-celebrates-fifth-anniversary-1928674.htm)
	* [Shared-State Concurrency](https://common-lisp.net/project/ssc/darcs/spec/specification.pdf)
	* [Concurrency series by Larry Osterman (Principal SDE at Microsoft)](https://social.msdn.microsoft.com/Profile/Larry%2bOsterman%2b%5BMSFT%5D/activity)
		* [Part 8 – Concurrency for scalability](https://blogs.msdn.microsoft.com/larryosterman/2005/02/28/concurrency-part-8-concurrency-for-scalability/)
		* [Part 9 - APIs that enable scalable programming](https://blogs.msdn.microsoft.com/larryosterman/2005/03/02/concurrency-part-9-apis-that-enable-scalable-programming/)
		* [Part 10 - How do you know if you’ve got a scalability issue?](https://blogs.msdn.microsoft.com/larryosterman/2005/03/03/concurrency-part-10-how-do-you-know-if-youve-got-a-scalability-issue/)
		* [Part 11 – Hidden scalability issues](https://blogs.msdn.microsoft.com/larryosterman/2005/03/04/concurrency-part-11-hidden-scalability-issues/)
		* [Part 12 – Hidden scalability issues (cont)](https://blogs.msdn.microsoft.com/larryosterman/2005/03/07/concurrency-part-12-hidden-scalability-issues-part-2/)
	* [Concurrency with Erlang](http://learnyousomeerlang.com/the-hitchhikers-guide-to-concurrency)
		* [Erlang in WhatsApp](https://blog.whatsapp.com/196/1-million-is-so-2011)
		* [Erlang in Riot Chat Server](https://engineering.riotgames.com/news/chat-service-architecture-servers)
		* [How Discord Scaled Elixir to Five Millions Concurrent Users](https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b)
		* [Mnesia: A Distributed DBMS Rooted in Concurrency](https://www.developer.com/db/article.php/3864331/Mnesia-A-Distributed-DBMS-Rooted-in-Concurrency.htm)
		* [Mesia and CAP](https://medium.com/@jlouis666/mnesia-and-cap-d2673a92850)		
	* [Running Concurrent Queries in GoSocial (Go and Neo4j) at Medium](https://medium.engineering/running-concurrent-queries-in-gosocial-28e5841b05b5)
	* [The Secret To 10 Million Concurrent Connections](http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html)
* [Parallel Computing](https://blogs.msdn.microsoft.com/ddperf/2009/05/02/are-we-taking-advantage-of-parallelism/)
	* [SPMD (Single Program Multiple Data): The Genetic Pattern](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-186.html)
	* [Master/Worker Pattern](https://docs.gigaspaces.com/sbp/master-worker-pattern.html)
	* [Loop Parallelism Pattern: Extracting parallel tasks from loops](https://www.cs.umd.edu/class/fall2001/cmsc411/projects/unroll/main.htm)
	* [Fork/Join Pattern: Good for recursive data processing](http://highscalability.com/learn-how-exploit-multiple-cores-better-performance-and-scalability)
	* [Map-Reduce: Born for Simplified Data Processing on Large Clusters](http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf)
	* [On the Death of Map-Reduce - Henry Robinson, Cloudera](http://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/)
	* [Server-side Optimization to Parallelize the Rendering of Web Pages at Yelp](https://engineeringblog.yelp.com/2017/07/generating-web-pages-in-parallel-with-pagelets.html)
* [Event-Driven Architecture](https://martinfowler.com/articles/201701-event-driven.html)
	* [Stream Processing, Event Sourcing, Reactive, CEP, etc and Making sense of it all - Martin Kleppmann](https://www.confluent.io/blog/making-sense-of-stream-processing/)
	* [Messaging](https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/cjt1004_.html)
		* [Publish-Subscribe](https://aws.amazon.com/pub-sub-messaging/)
			* [Autoscaling Pub-Sub Consumers at Spotify](https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/)
			* [Pulsar: Pub-Sub Messaging at Scale at Yahoo](https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale)
			* [Wormhole: Pub-Sub system at Facebook (2013)](https://code.facebook.com/posts/188966771280871/wormhole-pub-sub-system-moving-data-through-space-and-time/)
			* [Pub-Sub in Chatting Architecture on LINE LIVE](https://engineering.linecorp.com/en/blog/detail/85)
		* [Point-To-Point and Its Differences from Pub-Sub](https://www.journaldev.com/9743/jms-messaging-models)
		* [Store-Forward](https://docs.oracle.com/cd/E13222_01/wls/docs91/saf_admin/overview.html)
		* [Request-Reply](https://docs.tibco.com/pub/ftl/4.3.0/doc/html/GUID-A64ABED1-682E-4E1D-A94A-5590CB91B9BB.html)
	* [Enterprise Service Bus](http://www.oracle.com/technetwork/articles/soa/ind-soa-esb-1967705.html)
	* [Domain Events](https://martinfowler.com/eaaDev/DomainEvent.html)
		* [Domain Events: Simple and Reliable Solution](http://enterprisecraftsmanship.com/2017/10/03/domain-events-simple-and-reliable-solution/)
	* [Event Stream Processing](https://www.sas.com/en_us/insights/articles/big-data/3-things-about-event-stream-processing.html)
		* [Kafka Streams on Heroku](https://blog.heroku.com/kafka-streams-on-heroku)
		* [Kafka in Platform Events Architecture at Salesforce](https://engineering.salesforce.com/how-apache-kafka-inspired-our-platform-events-architecture-2f351fe4cf63)		
		* [Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo](https://yahooeng.tumblr.com/post/161855616651/open-sourcing-bullet-yahoos-forward-looking)
		* [Benchmarking Streaming Computation Engines at Yahoo](https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at)
	* [Event Sourcing](https://martinfowler.com/eaaDev/EventSourcing.html)
		* [Event Sourced Architectures for High Availability](https://www.infoq.com/presentations/Event-Sourced-Architectures-for-High-Availability)
		* [Event Sourcing and Stream Processing at Scale](https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-processing-at-ddd-europe.html)
		* [Scaling Event Sourcing for Netflix Downloads](https://www.infoq.com/presentations/netflix-scale-event-sourcing)
		* [Scaling Event-Sourcing at Jet.com](https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c873cac33b8)
	* [Command & Query Responsibility Segregation (CQRS)](https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs)
		* [Exploring CQRS and Event Sourcing - MSDN (with free ebook)](https://msdn.microsoft.com/en-us/library/jj554200.aspx)
		* [CQRS Simple Architecture](https://www.future-processing.pl/blog/cqrs-simple-architecture/)
		* [Building Scalable Applications Using Event Sourcing and CQRS with Kafka](https://initiate.andela.com/event-sourcing-and-cqrs-a-look-at-kafka-e0c1b90d17d8)	
* [Distributed Machine Learning](https://arxiv.org/pdf/1512.09295.pdf)
	* [Scalable Deep Learning Platform On Spark In Baidu](https://www.slideshare.net/JenAman/scalable-deep-learning-platform-on-spark-in-baidu)
	* [Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow](https://eng.uber.com/horovod/)	
	* [Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp](https://engineeringblog.yelp.com/2018/01/building-a-distributed-ml-pipeline-part1.html)
	* [TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo](https://yahooeng.tumblr.com/post/157196488076/open-sourcing-tensorflowonspark-distributed-deep)
	* [CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo](https://yahooeng.tumblr.com/post/139916828451/caffeonspark-open-sourced-for-distributed-deep)
	* [AIOps in Practice at Baidu](https://www.usenix.org/conference/srecon17asia/program/presentation/qu)
	* [Learning with Privacy at Scale - Differential Privacy Team, Apple](https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html)
	* [Image Classification Experiment Using Deep Learning at Mercari](https://medium.com/mercari-engineering/mercaris-image-classification-experiment-using-deep-learning-9b4e994a18ec)
	* [Content-based Video Relevance Prediction at Hulu](https://medium.com/hulu-tech-blog/content-based-video-relevance-prediction-b2c448e14752)
	* [PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes at Baidu](http://research.baidu.com/paddlepaddle-fluid-elastic-deep-learning-kubernetes/)
	* [Training ML Models with Airflow and BigQuery at WePay](https://wecode.wepay.com/posts/training-machine-learning-models-with-airflow-and-bigquery)
	* [Improving Photo Selection With Deep Learning at TripAdvisor](http://engineering.tripadvisor.com/improving-tripadvisor-photo-selection-deep-learning/)
	* [Machine Learning (2 parts) at Condé Nast](https://technology.condenast.com/story/handbag-brand-and-color-detection)
	* [Machine Learning Applications In The E-commerce Domain (4 parts) at Rakuten](https://techblog.rakuten.co.jp/2017/07/12/machine-learning-applications-in-the-e-commerce-domain-4/)
	* [Venue Rating System at Foursquare](https://engineering.foursquare.com/finding-the-perfect-10-how-we-developed-the-foursquare-venue-rating-system-c76b08f7b9b3)	
* [Distributed Architecture in Financial Systems](https://medium.com/@sofie_4036/lets-build-a-bank-service-architecture-410dca881291)
	* [Building a Modern Bank Backend at Monzo](https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/)
	* [Choosing an Architecture for Core Banking System at TrustBK](https://blog.trustbk.com/choosing-an-architecture-85750e1e5a03)
	* [Reinventing the Trading Platform for Scale at Wealthsimple](https://medium.com/@Wealthsimple/engineering-at-wealthsimple-reinventing-our-trading-platform-for-scale-17e332241b6c)
	* [Tech Stack at TransferWise](http://tech.transferwise.com/the-transferwise-stack-heartbeat-of-our-little-revolution/)			
## Availability
* [Failover](http://cloudpatterns.org/mechanisms/failover_system)
	* [The Evolution of Global Traffic Routing and Failover](https://www.usenix.org/conference/srecon16/program/presentation/heady)
	* [Testing for Disaster Recovery Failover Testing](https://www.usenix.org/conference/srecon17asia/program/presentation/liu_zehua)
	* [Designing a Microservices Architecture for Failure](https://blog.risingstack.com/designing-microservices-architecture-for-failure/)
* [Replication](https://m.alphasights.com/a-primer-on-database-replication-381b319cd032)
	* [Master-Slave](https://engineering.bitnami.com/articles/enabling-additional-nodes-to-bitnami-mysql-with-replication.html)
	* [Tree Replication](https://link.springer.com/chapter/10.1007/3-540-44863-2_47)
	* [Master-Master](http://sabbour.me/highly-available-and-scalable-master-master-mysql-on-azure-virtual-machines/)
	* [Buddy Replication](https://developer.jboss.org/wiki/JBossCacheBuddyReplicationDesign)
* [NodeJS High Availability at Yahoo](https://yahooeng.tumblr.com/post/68823943185/nodejs-high-availability)
* [Every Day Is Monday in Operations (11 parts) at LinkedIn ](https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason)
* [Practical Guide to Monitoring and Alerting with Time Series at Scale](https://www.usenix.org/conference/srecon17americas/program/presentation/wilkinson)
* [How Robust Monitoring Powers High Availability for LinkedIn Feed](https://www.usenix.org/conference/srecon17americas/program/presentation/barot)
* [Architectural Patterns for High Availability - Adrian Cockcroft, Director of Architecture at Netflix](https://www.infoq.com/presentations/Netflix-Architecture)
* [Ensuring Resilience to Disaster at Quora](https://engineering.quora.com/Ensuring-Quoras-Resilience-to-Disaster)
* [Resiliency against Traffic Oversaturation at iHeartRadio](https://tech.iheart.com/resiliency-against-traffic-oversaturation-77c5ed92a5fb)
* [How Production Engineers Support Global Events at Facebook](https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/)

## Stability
* [Circuit Breaker](https://martinfowler.com/bliki/CircuitBreaker.html)
	* [Circuit Breaking in Distributed Systems](https://www.infoq.com/presentations/circuit-breaking-distributed-systems)
	* [Circuit Breakers for Distributed Services at LINE](https://engineering.linecorp.com/en/blog/detail/76)
	* [Applying Circuit Breaker to Channel Gateway at LINE](https://engineering.linecorp.com/en/blog/detail/78)
	* [Lessons in Resilience at SoundCloud](https://developers.soundcloud.com/blog/lessons-in-resilience-at-SoundCloud)
	* [Circuit Breaker for Scaling Containers](https://f5.com/about-us/blog/articles/the-art-of-scaling-containers-circuit-breakers-28919)
	* [Protector: Circuit Breaker for Time Series Databases at Trivago](http://tech.trivago.com/2016/02/23/protector/)
* [Always use timeouts (if possible)](https://www.javaworld.com/article/2824163/application-performance/stability-patterns-applied-in-a-restful-architecture.html)
* [Let it crash/Supervisors: Embrace failure as a natural state in the life-cycle of the application](http://erlang.org/doc/design_principles/sup_princ.html)
* [Crash early: An error now is better than a response tomorrow](http://odino.org/better-performance-the-case-for-timeouts/)
* [Bulkheads: Partition and tolerate failure in one part](https://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html)
* [Steady state: Always put logs on separate disk](https://docs.microsoft.com/en-us/sql/relational-databases/policy-based-management/place-data-and-log-files-on-separate-drives)
* [Throttling: Maintain a steady pace](http://www.sosp.org/2001/papers/welsh.pdf)
* [Multi-clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn](https://engineering.linkedin.com/blog/2017/11/improving-resiliency-and-stability-of-a-large-scale-api)

## Performance
* [Web Performance: Cache Efficiency Exercise at Facebook](https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/)
* [Improving Performance with Background Data Prefetching at Instagram](https://engineering.instagram.com/improving-performance-with-background-data-prefetching-b191acb39898)
* [Compression Techniques to Solve Network I/O Bottlenecks at eBay](https://www.ebayinc.com/stories/blogs/tech/how-ebays-shopping-cart-used-compression-techniques-to-solve-network-io-bottlenecks/)
* [Optimizing Web Servers for High Throughput and Low Latency at Dropbox](https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/)
* [Boosting Site Speed Using Brotli Compression at LinkedIn](https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression)
* [Linux Performance Analysis in 60.000 Milliseconds at Netflix](https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55)
* [Optimizing 360 Photos at Scale at Facebook](https://code.facebook.com/posts/129055711052260/optimizing-360-photos-at-scale/)
* [Reducing Image File Size in the Photos Infrastructure at Etsy](https://codeascraft.com/2017/05/30/reducing-image-file-size-at-etsy/)
* [Improving Video Thumbnails with Deep Neural Nets at YouTube](https://youtube-eng.googleblog.com/2015/10/improving-youtube-video-thumbnails-with_8.html)
* [Optimizing APIs through Dynamic Polyglot Runtime, Fully Asynchronous, and Reactive Programming at Netflix](https://medium.com/netflix-techblog/optimizing-the-netflix-api-5c9ac715cf19)
* [Optimizing Video Playback Performance at Pinterest](https://medium.com/@Pinterest_Engineering/optimizing-video-playback-performance-caf55ce310d1)
* [Reducing Video Loading Time by Prefetching during Preroll at Dailymotion](http://engineering.dailymotion.com/reducing-video-loading-time-prefetching-video-during-preroll/)
* [Improving GIF Performance at Pinterest](https://medium.com/@Pinterest_Engineering/improving-gif-performance-on-pinterest-8dad74bf92f1)
* [Performance Improvements (All Stacks) at Pinterest](https://medium.com/@Pinterest_Engineering/driving-user-growth-with-performance-improvements-cfc50dafadd7)
* [Server Side Rendering at Wix](https://www.youtube.com/watch?v=f9xI2jR71Ms)
* [30x Performance Improvements on MySQLStreamer at Yelp](https://engineeringblog.yelp.com/2018/02/making-30x-performance-improvements-on-yelps-mysqlstreamer.html)
* [Performance Monitoring with Riemann and Clojure at Walmart](https://medium.com/walmartlabs/performance-monitoring-with-riemann-and-clojure-eafc07fcd375)
* [Improving Homepage Performance at Zillow](https://www.zillow.com/engineering/improving-homepage-performance/)
* [Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier](https://zapier.com/engineering/celery-python-jemalloc/)
* [Using Java Large Heap (110 GB) for Boosting Site Perpormance at Expedia](https://techblog.expedia.com/2015/09/25/solving-problems-with-very-large-java-heaps/)

## Others
* [Architecture of Tripod (Flickr’s Backend)](https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored)
* [Architecture of SurveyMonkey](https://engineering.surveymonkey.com/2016/04/09/the-architecture-behind-surveymonkey/)
* [Architecture of Data Platform at Flipkart](https://tech.flipkart.com/overview-of-flipkart-data-platform-20c6d3e9a196)
* [Architecture of Stack Overflow Enterprise at Palantir](https://medium.com/@palantir/terraforming-stack-overflow-enterprise-in-aws-47ee431e6be7)
* [Architecture of Distributed Cron at Quora](https://engineering.quora.com/Quoras-Distributed-Cron-Architecture)
* [Simone: Distributed Simulation Service at Netflix](https://medium.com/netflix-techblog/https-medium-com-netflix-techblog-simone-a-distributed-simulation-service-b2c85131ca1b)
* [Seagull: Distributed System that Helps Running > 20 Million Tests Per Day at Yelp](https://engineeringblog.yelp.com/2017/04/how-yelp-runs-millions-of-tests-every-day.html)
* [Cloud Bouncer: Distributed Rate Limiting at Yahoo](https://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo)
* [Selecting a Cloud Provider at Etsy](https://codeascraft.com/2018/01/04/selecting-a-cloud-provider/)
* [Basic Infrastructure Patterns at Zenefits](https://engineering.zenefits.com/2016/02/basic-infrastructure-patterns/)
* [Syscall Auditing at Scale at Slack](https://slack.engineering/syscall-auditing-at-scale-e6a3ca8ac1b8)
* [Scaling Online Migrations at Stripe](https://stripe.com/blog/online-migrations)
* [Netflix: What Happens When You Press Play?](http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html)
* [Service Decomposition at Scale at Intuit QuickBooks](https://quickbooks-engineering.intuit.com/service-decomposition-at-scale-70405ac2f637)
* [Back-end at BlaBlaCar](http://blablatech.com/blog/BlaBlaTech-behind-the-scene)
* [Scalable Gaming Patterns on AWS](https://d0.awsstatic.com/whitepapers/aws-scalable-gaming-patterns.pdf)
* [How League Of Legends Scaled Chat To 70 Million Players](http://highscalability.com/blog/2014/10/13/how-league-of-legends-scaled-chat-to-70-million-players-it-t.html)
* [Scaling NodeJS at Alibaba](https://www.linux.com/blog/can-nodejs-scale-ask-team-alibaba)
* [Distributed Firewall at Linkedin](https://www.youtube.com/watch?v=Kb_dU6t56mo)

## Talks
* [Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent](https://www.youtube.com/watch?v=Y6Ev8GIlbxc)
* [Building Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich, Software Engineer at Facebook](https://www.usenix.org/conference/srecon17americas/program/presentation/erlich)
* [Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google](https://www.usenix.org/conference/srecon16/program/presentation/alvidrez)
* [Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox](https://www.youtube.com/watch?v=ggizCjUCCqE)
* [How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform](https://www.youtube.com/watch?v=H4vMcD7zKM0)
* [Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix](https://www.youtube.com/watch?v=CZ3wIuvmHeM&t=2837s)
* [Achieving Rapid Response Times in Large Online Services - Jeff Dean, Google Senior Fellow](https://www.youtube.com/watch?v=1-3Ahy7Fxsc)
* [Architecture to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen, Engineering Lead at Shopify](https://www.youtube.com/watch?v=N8NWDHgWA28)
* [Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook](https://www.youtube.com/watch?v=QCHiNEw73AU)
* [Performance Optimization for the Greater China Region at Salesforce - Jeff Cheng, Enterprise Architect at Salesforce](https://www.salesforce.com/video/1757880/)
* [How GIPHY Delivers a GIF to 300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY](https://vimeo.com/252367076)
* [High Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior Director at Alibaba](https://www.youtube.com/watch?v=wzsxJqeVIhY&list=PLMu8-hpCxIVENuAue7bd0eCAglLGY_8AW&index=7)
* [Scaling Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc)
* [Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox](https://www.youtube.com/watch?v=IhGWOaD5BYQ)
* [Scaling Live Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of Engineering at Facebook](https://www.youtube.com/watch?v=IO4teCbHvZw)
* [Scaling Infrastructure at Instagram - Lisa Guo, Instagram Engineering](https://www.youtube.com/watch?v=hnpzNAPiC0E)
* [Scaling Infrastructure at Twitter - Yao Yue, Staff Software Engineer at Twitter](https://www.youtube.com/watch?v=6OvrFkLSoZ0)
* [Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy](https://www.youtube.com/watch?v=LfqyhM1LeIU)
* [Scaling Data Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify](https://www.youtube.com/watch?v=cdsfRXr9pJU)
* [Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer](https://www.youtube.com/watch?v=jQNCuD_hxdQ&list=RDhnpzNAPiC0E&index=11)
* [Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack](https://www.infoq.com/presentations/slack-scalability)
* [Scaling Backend at Youtube - Sugu Sougoumarane, SDE at Youtube](https://www.youtube.com/watch?v=5yDO-tmIoXY&feature=youtu.be)
* [Scaling Backend at Uber - Matt Ranney, Chief Systems Architect at Uber](https://www.youtube.com/watch?v=nuiLcWE8sPA)
* [Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix](https://www.youtube.com/watch?v=tbqcsHg-Q_o)
* [Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook](https://www.youtube.com/watch?v=bxhYNfFeVF4)
* [Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek](https://www.youtube.com/watch?v=RlkCdM_f3p4)
* [Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora](https://www.infoq.com/presentations/quora-analytics)
* [Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft](https://www.youtube.com/watch?v=g_MPGU_m01s)

## Books
* [Google Site Reliability Engineering (Online - Free)](https://landing.google.com/sre/book.html)
* [Distributed Systems for Fun and Profit (Online - Free)](http://book.mixu.net/distsys/)
* [What Every Developer Should Know About SQL Performance (Online - Free)](https://use-the-index-luke.com/sql/table-of-contents)
* [Beyond the Twelve-Factor App - Exploring the DNA of Highly Scalable, Resilient Cloud Applications (Free)](http://www.oreilly.com/webops-perf/free/beyond-the-twelve-factor-app.csp)
* [Chaos Engineering - Building Confidence in System Behavior through Experiments (Free)](http://www.oreilly.com/webops-perf/free/chaos-engineering.csp?intcmp=il-webops-free-product-na_new_site_chaos_engineering_text_cta)
* [The Art of Scalability](http://theartofscalability.com/)
* [Designing Data-Intensive Applications](https://dataintensive.net/)
* [Web Scalability for Startup Engineers](https://www.goodreads.com/book/show/23615147-web-scalability-for-startup-engineers)
* [Scalability Rules: 50 Principles for Scaling Web Sites](http://scalabilityrules.com/)

## Special Thanks
* Jonas Bonér, CTO at Lightbend, for the [original inspiration](https://www.slideshare.net/jboner/scalability-availability-stability-patterns)

## License

[![CC-BY](https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/by.svg)](https://creativecommons.org/licenses/by/4.0/)

Copyright Benny (Quoc-Binh) Nguyen, 2018. Licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
-												Refactor

											
										
										
											2018-03-04 09:01:15 +06:00
+								# High Scalability, High Availability, and High Stability Design Patterns in Back-end Systems
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
-												Refactor

											
										
										
											2018-03-04 09:01:15 +06:00
+								An updated and curated list of selected readings to illustrate High Scalability, High Availability, and High Stability Design Patterns in Back-end. Concepts are explained in the articles of notable engineers (Werner Vogels, James Hamilton, Jeff Atwood, Martin Fowler, Robert C. Martin, Tom White, Martin Kleppmann) and high quality reference sources (highscalability.com, infoq.com, official engineering blogs, etc). Case studies are taken from battle-tested systems those are serving millions to billions of users (Netflix, Alibaba, Flipkart, LINE, Spotify, etc).
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
-												Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple!

											
										
										
											2018-01-23 17:49:13 +06:00
+								#### What if your Back-end went slow?
-												PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes at Baidu

											
										
										
											2018-02-07 18:30:49 +06:00
+								> Understand your problems: performance problem (slow for a single user) or scalability problem (fast for a single user but slow under heavy load) by reviewing [design principles](#principles). You can also check some [talks](#talks) of elite engineers from tech giants (Google, Facebook, Instagram, etc) to see how they build and scale their systems.
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
-												Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple!

											
										
										
											2018-01-23 17:49:13 +06:00
+								#### What if your Back-end went down?
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								> "Even if you lose all one day, you can build all over again if you retain your calm!" - Thuan Pham, CTO at Uber Technologies Inc.
-												Fix the headline

											
										
										
											2018-01-25 22:26:09 +06:00
+								#### Community Power
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
-												Add sharing by Twitter

											
										
										
											2018-02-09 22:13:54 +06:00
+								> Contributions are greatly welcome! You may want to take a look at the [contribution guidelines](CONTRIBUTING.md).
-												Redis at Deliveroo

											
										
										
											2018-02-10 17:23:25 +06:00
+								> If you find this project helpful, [please help me share it on Twitter!](https://ctt.ec/V8B2p) Thank you very much :heart:
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
 								## Contents
-												Refactor the Basic section into Principles

											
										
										
											2018-01-20 19:55:07 +06:00
+								- [Principles](#principles)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								- [Scalability](#scalability)
 								- [Availability](#availability)
 								- [Stability](#stability)
-												Add a section for Performance

											
										
										
											2018-01-26 18:05:29 +06:00
+								- [Performance](#performance)
-												Edit introduction

											
										
										
											2018-01-25 20:07:46 +06:00
+								- [Other Aspects](#others)
-												Concurrency series by Larry Osterman (Principal SDE at Microsoft)

											
										
										
											2018-01-20 09:25:42 +06:00
+								- [Talks](#talks)
-												Fix order error of Books and Talks

											
										
										
											2018-01-25 22:32:25 +06:00
+								- [Books](#books)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
-												Refactor the Basic section into Principles

											
										
										
											2018-01-20 19:55:07 +06:00
+								## Principles
-												My Scaling Hero - Jeff Atwood

											
										
										
											2018-02-12 22:03:42 +06:00
+								* [My Scaling Hero - Jeff Atwood](https://blog.codinghorror.com/my-scaling-hero/)
-												Principles of Chaos Engineering

											
										
										
											2018-01-22 10:27:41 +06:00
+								* [Principles of Chaos Engineering](https://www.usenix.org/conference/srecon17americas/program/presentation/rosenthal)
-												Finding the Order in Chaos

											
										
										
											2018-01-22 10:37:01 +06:00
+								* [Finding the Order in Chaos](https://www.usenix.org/conference/srecon16/program/presentation/lueder)
-												The Clean Architecture - Robert C. Martin (Uncle Bob)

											
										
										
											2018-01-24 19:56:01 +06:00
+								* [The Clean Architecture - Robert C. Martin (Uncle Bob)](https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-architecture.html)
-												Twelve-Factor App

											
										
										
											2018-01-27 16:22:09 +06:00
+								* [The Twelve-Factor App](https://12factor.net/)
-Common (Large-Scale) Software Architectural Patterns in a Nutshell

											
										
										
											2018-01-29 14:18:56 +06:00
+								* [10 Common (Large-Scale) Software Architectural Patterns in a Nutshell](https://towardsdatascience.com/10-common-software-architectural-patterns-in-a-nutshell-a0b47a1e9013)
-												CAP Twelve Years Later: How the Rules Have Changed (2012) - Eric Brewer (VP of Infrastructure at Google)

											
										
										
											2018-01-25 08:24:36 +06:00
+								* [CAP Theorem and Trade-offs](http://robertgreiner.com/2014/08/cap-theorem-revisited/)
-												Latency Numbers Every Programmer Should Know

											
										
										
											2018-03-10 18:31:15 +06:00
+								* [CAP Twelve Years Later: How the "Rules" Have Changed (2012) - Eric Brewer, VP of Infrastructure at Google](https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed)
-												Replacing the Scale Up/Scale Out link with a higher quality one, add Scaling Up vs Scaling Out: Hidden Costs

											
										
										
											2018-01-25 18:52:35 +06:00
+								* [Scale Up or Scale Out, What it is and Why You Should Care](https://www.brianjgraf.com/2013/05/17/scalability-scale-up-scale-out-care/)
-												Make Performance Part of Your Workflow

											
										
										
											2018-01-25 19:01:12 +06:00
+								* [Scaling Up vs Scaling Out: Hidden Costs](https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/)
-												Refactor the Basic section into Principles

											
										
										
											2018-01-20 19:55:07 +06:00
+								* [ACID and BASE](https://neo4j.com/blog/acid-vs-base-consistency-models-explained/)
-												Why Non-Blocking?

											
										
										
											2018-01-21 08:17:24 +06:00
+								* [Blocking/Non-Blocking and Sync/Async](https://blogs.msdn.microsoft.com/csliu/2009/08/27/io-concept-blockingnon-blocking-vs-syncasync/)
 								* [Why Non-Blocking?](https://techblog.bozho.net/why-non-blocking/)
-												Performance and Scalability of Databases

											
										
										
											2018-02-27 22:54:35 +06:00
+								* [Performance and Scalability of Databases](https://use-the-index-luke.com/sql/testing-scalability)
-												Database Isolation Levels and Effects on Performance and Scalability

											
										
										
											2018-02-26 21:15:10 +06:00
+								* [Database Isolation Levels and Effects on Performance and Scalability](http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html)
-												SQL or NoSQL - Lesson Learned from Salesforce

											
										
										
											2018-01-29 14:07:00 +06:00
+								* [SQL versus NoSQL](https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/)
-												Refactor

											
										
										
											2018-02-14 15:29:46 +06:00
+								* [Practical NoSQL resilience design pattern for the enterprise (eBay)](https://www.ebayinc.com/stories/blogs/tech/practical-nosql-resilience-design-pattern-for-the-enterprise/)
-												SQL or NoSQL - Lesson Learned from Salesforce

											
										
										
											2018-01-29 14:07:00 +06:00
+								* [SQL or NoSQL - Lesson Learned from Salesforce](https://engineering.salesforce.com/sql-or-nosql-9eaf1d92545b)
-												Refactored, happy weekend my friends!

											
										
										
											2018-02-04 15:10:26 +06:00
+								* [How Sharding Works](https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6)
-												(UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify

											
										
										
											2018-01-24 19:17:50 +06:00
+								* [Consistent Hashing - Tom White, author of 'Hadoop: the Definitive Guide'](http://www.tom-e-white.com/2007/11/consistent-hashing.html)
-												Uniform Consistent Hashing

											
										
										
											2018-01-26 22:39:22 +06:00
+								* [Uniform Consistent Hashing](https://medium.com/netflix-techblog/distributing-content-to-open-connect-3e3e391d4dc9)
-												Eventually Consistent - Werner Vogels, CTO at Amazon

											
										
										
											2018-01-26 21:31:50 +06:00
+								* [Eventually Consistent - Werner Vogels, CTO at Amazon](https://www.allthingsdistributed.com/2008/12/eventually_consistent.html)
-												Consistent Hashing - Explained by Tom White, author of 'Hadoop: the Definitive Guide'

											
										
										
											2018-01-21 00:22:18 +06:00
+								* [Cache is King!](https://www.stevesouders.com/blog/2012/10/11/cache-is-king/)
-												Anti-Caching

											
										
										
											2018-01-24 20:41:57 +06:00
+								* [Anti-Caching](http://the-paper-trail.org/blog/paper-notes-anti-caching/)
-												Understand why Cache is King!

											
										
										
											2018-01-20 20:14:49 +06:00
+								* [Understand Latency](http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it)
-												Latency Numbers Every Programmer Should Know

											
										
										
											2018-03-10 18:31:15 +06:00
+								* [Latency Numbers Every Programmer Should Know](http://norvig.com/21-days.html#answers)
-												Refactor the Basic section into Principles

											
										
										
											2018-01-20 19:55:07 +06:00
+								* [Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO](http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html)
-Common Bottlenecks

											
										
										
											2018-01-20 20:19:45 +06:00
+								* [20 Common Bottlenecks](http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html)
-												Life Beyond Distributed Transactions

											
										
										
											2018-01-26 20:21:38 +06:00
+								* [Life Beyond Distributed Transactions](https://queue.acm.org/detail.cfm?id=3025012)
-												Relying on Software to Redirect Traffic Reliably at Various Layers

											
										
										
											2018-01-22 09:47:46 +06:00
+								* [Relying on Software to Redirect Traffic Reliably at Various Layers](https://www.usenix.org/conference/srecon15/program/presentation/taveira)
-												Refactor the Basic section into Principles

											
										
										
											2018-01-20 19:55:07 +06:00
+								* [Advantages and Drawbacks of Microservices](https://cloudacademy.com/blog/microservices-architecture-challenge-advantage-drawback/)
-												Microservices Scale Cube

											
										
										
											2018-01-31 08:08:26 +06:00
+								* [Microservices Scale Cube](http://microservices.io/articles/scalecube.html)
-												Breaking Things on Purpose

											
										
										
											2018-01-22 10:09:26 +06:00
+								* [Breaking Things on Purpose](https://www.usenix.org/conference/srecon17americas/program/presentation/andrus)
-												Refactor for better viewing experience

											
										
										
											2018-01-24 08:52:18 +06:00
+								* [Avoid Over Engineering](https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8)
-												Scalability Worst Practices

											
										
										
											2018-01-25 14:43:48 +06:00
+								* [Scalability Worst Practices](https://www.infoq.com/articles/scalability-worst-practices)
-												Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple!

											
										
										
											2018-01-23 17:49:13 +06:00
+								* [Use Solid Technologies - Don’t Re-invent the Wheel - Keep It Simple!](https://medium.com/@DataStax/instagram-engineerings-3-rules-to-a-scalable-cloud-application-architecture-c44afed31406)
-												Why Over-Reusing is Bad

											
										
										
											2018-01-29 13:58:49 +06:00
+								* [Why Over-Reusing is Bad](http://tech.transferwise.com/why-over-reusing-is-bad/)
-												Performance is a Feature

											
										
										
											2018-01-23 17:12:08 +06:00
+								* [Performance is a Feature](https://blog.codinghorror.com/performance-is-a-feature/)
-												Make Performance Part of Your Workflow

											
										
										
											2018-01-25 19:01:12 +06:00
+								* [Make Performance Part of Your Workflow](https://codeascraft.com/2014/12/11/make-performance-part-of-your-workflow/)
-												The Benefits of Server Side Rendering Over Client Side Rendering

											
										
										
											2018-01-26 22:56:12 +06:00
+								* [The Benefits of Server Side Rendering Over Client Side Rendering](https://medium.com/walmartlabs/the-benefits-of-server-side-rendering-over-client-side-rendering-5d07ff2cefe8)
-												Writing Code that Scales

											
										
										
											2018-01-23 23:38:37 +06:00
+								* [Writing Code that Scales](https://blog.rackspace.com/writing-code-that-scales)
-												Automate and Abstract: Lessons from Facebook on Engineering for Scale

											
										
										
											2018-01-29 22:27:06 +06:00
+								* [Automate and Abstract: Lessons from Facebook on Engineering for Scale](https://architecht.io/lessons-from-facebook-on-engineering-for-scale-f5716f0afc7a)
-												AWS Do's and Don'ts

											
										
										
											2018-01-24 19:57:48 +06:00
+								* [AWS Do's and Don'ts](https://8thlight.com/blog/sarah-sunday/2017/09/15/aws-dos-and-donts.html)
-												(UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify

											
										
										
											2018-01-24 19:17:50 +06:00
+								* [(UI) Design Doesn’t Scale - Stanley Wood, Design Director at Spotify](https://medium.com/@hellostanley/design-doesnt-scale-4d81e12cbc3e)
-												Change the link of Design for Loose-coupling to a better one

											
										
										
											2018-01-26 20:35:06 +06:00
+								* [Design for Loose-coupling](http://bulgerpartners.com/how-loosely-coupled-architectures-are-helping-the-modernization-of-legacy-software/)
-												Refactor and add some entries for Basic section

											
										
										
											2018-01-10 23:08:02 +06:00
+								* [Design for Resiliency](http://highscalability.com/blog/2012/12/31/designing-for-resiliency-will-be-so-2013.html)
-												Refactor the list

											
										
										
											2018-01-22 10:19:38 +06:00
+								* [Design for Self-healing](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/self-healing)
 								* [Design for Scaling Out](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out)
-Common Bottlenecks

											
										
										
											2018-01-20 20:19:45 +06:00
+								* [Best Practices for Scaling Out](https://blog.openshift.com/best-practices-for-horizontal-application-scaling/)
-												Refactor and add some entries for Basic section

											
										
										
											2018-01-10 23:08:02 +06:00
+								* [Design for Evolution](https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/design-for-evolution)
-												Learn From Mistakes

											
										
										
											2018-01-24 00:38:47 +06:00
+								* [Learn from Mistakes](http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html)
-												Linux Performance

											
										
										
											2018-03-02 11:24:38 +06:00
+								* [Linux Performance](http://www.brendangregg.com/linuxperf.html)
-												How To Design A Good API and Why it Matters - Joshua Bloch

											
										
										
											2018-03-08 06:30:57 +06:00
+								* [How To Design A Good API and Why it Matters - Joshua Bloch](https://www.infoq.com/presentations/effective-api-design)
-												Latency Numbers Every Programmer Should Know

											
										
										
											2018-03-10 18:31:15 +06:00
+								* [Talks/Papers on Efficiency, Reliability, Scaling - James Hamilton, VP and Distinguished Engineer at AWS](http://mvdirona.com/jrh/work/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
 								## Scalability
-												Operate Kubernetes Reliably at Stripe

											
										
										
											2018-01-31 13:30:26 +06:00
+								* [Microservices and Orchestration](https://hackernoon.com/microservices-are-hard-an-invaluable-guide-to-microservices-2d06bd7bcf5d)
-												Microservices Resource Guide - Martin Fowler, Chief Scientist at ThoughtWorks

											
										
										
											2018-01-23 17:22:01 +06:00
+									* [Microservices Resource Guide - Martin Fowler, Chief Scientist at ThoughtWorks](https://martinfowler.com/microservices/)
-												Refactored, happy weekend my friends!

											
										
										
											2018-02-04 15:10:26 +06:00
+									* [Microservices Patterns](http://microservices.io/patterns/)
-												refactor

											
										
										
											2018-02-12 09:16:53 +06:00
+									* [Thinking Inside the Container (8 parts) at Riot Games](https://engineering.riotgames.com/news/thinking-inside-container)
-												Add the section of Microservices

											
										
										
											2018-01-17 11:38:38 +06:00
+									* [Containerization at Pinterest](https://medium.com/@Pinterest_Engineering/containerization-at-pinterest-92295347f2f3)
-												Techniques for Splitting Up a Codebase into Microservices and Artifacts at LinkedIn

											
										
										
											2018-02-20 12:12:18 +06:00
+									* [Techniques for Splitting Up a Codebase into Microservices and Artifacts at LinkedIn](https://engineering.linkedin.com/blog/2016/02/q-a-with-jim-brikman--splitting-up-a-codebase-into-microservices)
-												Add the section of Microservices

											
										
										
											2018-01-17 11:38:38 +06:00
+									* [The Evolution of Container Usage at Netflix](https://medium.com/netflix-techblog/the-evolution-of-container-usage-at-netflix-3abfc096781b)
 									* [Dockerizing MySQL at Uber](https://eng.uber.com/dockerizing-mysql/)
-												Testing of Microservices at Spotify

											
										
										
											2018-01-18 14:27:42 +06:00
+									* [Testing of Microservices at Spotify](https://labs.spotify.com/2018/01/11/testing-of-microservices/)
-												Organize Monolith Before Breaking it into Services at Weebly

											
										
										
											2018-01-24 18:03:53 +06:00
+									* [Organize Monolith Before Breaking it into Services at Weebly](https://medium.com/weebly-engineering/how-to-organize-your-monolith-before-breaking-it-into-services-69cbdb9248b0)
-												Lessons learned running Docker in production at Treehouse

											
										
										
											2018-01-29 14:10:33 +06:00
+									* [Lessons learned running Docker in production at Treehouse](https://medium.com/treehouse-engineering/lessons-learned-running-docker-in-production-5dce99ece770)
-												Inside a SoundCloud Microservice

											
										
										
											2018-01-30 17:27:34 +06:00
+									* [Inside a SoundCloud Microservice](https://developers.soundcloud.com/blog/inside-a-soundcloud-microservice)
-												Microservices at BlaBlaCar

											
										
										
											2018-02-09 15:45:28 +06:00
+									* [Microservices at BlaBlaCar](http://blablatech.com/blog/micro-service-at-blablacar)
-												Operate Kubernetes Reliably at Stripe

											
										
										
											2018-01-31 13:30:26 +06:00
+									* [Operate Kubernetes Reliably at Stripe](https://stripe.com/blog/operating-kubernetes)
-												Kafka for PaaS at Rakuten

											
										
										
											2018-03-09 13:57:19 +06:00
+									* [Kubernetes Traffic Routing (2 parts) at Rakuten](https://techblog.rakuten.co.jp/2017/09/28/k8s-routing2/)
-												refactor

											
										
										
											2018-02-12 09:16:53 +06:00
+									* [Agrarian-Scale Kubernetes (3 parts) at New York Times](https://open.nytimes.com/agrarian-scale-kubernetes-part-3-ee459887ed7e)
-												Mesos, Docker and Ochopod in Localization Services at Autodesk

											
										
										
											2018-02-12 09:22:06 +06:00
+									* [Mesos, Docker and Ochopod in Localization Services at Autodesk](http://cloudengineering.autodesk.com/blog/2015/11/mesos-docker-and-ochopod-in-autodesk-localization-services.html)
-												Nanoservices at BBC Online

											
										
										
											2018-02-12 09:24:39 +06:00
+									* [Nanoservices at BBC Online](https://medium.com/bbc-design-engineering/powering-bbc-online-with-nanoservices-727840ba015b)
-												PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg

											
										
										
											2018-02-13 18:33:37 +06:00
+									* [PowerfulSeal: Testing Tool for Kubernetes Clusters at Bloomberg](https://www.techatbloomberg.com/blog/powerfulseal-testing-tool-kubernetes-clusters/)
-												Conductor: Microservices Orchestrator at Netflix

											
										
										
											2018-02-15 08:22:38 +06:00
+									* [Conductor: Microservices Orchestrator at Netflix](https://medium.com/netflix-techblog/netflix-conductor-a-microservices-orchestrator-2e8d4771bf40)
-												Making 10x Improvement in Release Times with Docker and Amazon ECS at Nextdoor

											
										
										
											2018-02-26 11:49:48 +06:00
+									* [Making 10x Improvement in Release Times with Docker and Amazon ECS at Nextdoor](https://engblog.nextdoor.com/how-nextdoor-made-a-10x-improvement-in-release-times-with-docker-and-amazon-ecs-35aab52b726f)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								* [Distributed Caching](https://www.wix.engineering/single-post/scaling-to-100m-to-cache-or-not-to-cache)
 									* [Write-behind and Write-through](https://docs.oracle.com/cd/E15357_01/coh.360/e15723/cache_rtwtwbra.htm#COHDG5177)
 									* [Eviction Policies](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html)
 									* [Peer-To-Peer Caching](https://en.wikipedia.org/wiki/P2P_caching)
-												Reduce Memcached Memory Usage by 50% at Trivago

											
										
										
											2018-02-01 11:27:18 +06:00
+									* [EVCache: Caching for a Global Netflix](https://medium.com/netflix-techblog/caching-for-a-global-netflix-7bcc457012f1)
 									* [Memsniff: Robust Memcache Traffic Analyzer at Box.com](https://blog.box.com/blog/introducing-memsniff-robust-memcache-traffic-analyzer/)
-												Minor rename

											
										
										
											2018-01-29 14:12:47 +06:00
+									* [Caching with Consistent Hashing and Cache Smearing at Etsy](https://codeascraft.com/2017/11/30/how-etsy-caches/)
-												An Analysis of Facebook Photo Caching

											
										
										
											2018-01-28 10:05:36 +06:00
+									* [An Analysis of Facebook Photo Caching](https://code.facebook.com/posts/220956754772273/an-analysis-of-facebook-photo-caching/)
-												Reduce Memcached Memory Usage by 50% at Trivago

											
										
										
											2018-02-01 11:27:18 +06:00
+									* [Reduce Memcached Memory Usage by 50% at Trivago](http://tech.trivago.com/2017/12/19/how-trivago-reduced-memcached-memory-usage-by-50/)
-												Tracking Service Infrastructure at Scale at Spotify

											
										
										
											2018-01-22 10:04:51 +06:00
+								* [Distributed Tracking and Tracing](https://www.oreilly.com/ideas/understanding-the-value-of-distributed-tracing)
-												Fix typo at Tracking Service Infrastructure at Scale at Spotify

											
										
										
											2018-01-31 17:26:12 +06:00
+									* [Tracking Service Infrastructure at Scale at Shopify](https://www.usenix.org/conference/srecon17americas/program/presentation/arthorne)
-												Tracking Service Infrastructure at Scale at Spotify

											
										
										
											2018-01-22 10:04:51 +06:00
+									* [Distributed Tracing with Pintrace at Pinterest](https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b)
 									* [Analyzing Distributed Trace Data at Pinterest](https://medium.com/@Pinterest_Engineering/analyzing-distributed-trace-data-6aae58919949)
 									* [Distributed Tracing at Uber](https://eng.uber.com/distributed-tracing/)
-												Data Checking at Dropbox

											
										
										
											2018-01-22 10:56:40 +06:00
+									* [Data Checking at Dropbox](https://www.usenix.org/conference/srecon17asia/program/presentation/mah)
-												Add link to Tracing distributed systems at Showmax

It shows that you can build tracing system for distributed systems (microservices) quite easily without additional systems. It uses central logging facility built around ElasticSearch.

											
										
										
											2018-01-24 13:51:34 +06:00
+									* [Tracing distributed systems at Showmax](https://tech.showmax.com/2016/10/tracing-distributed-systems-at-showmax/)
-												The Log: What Every Software Engineer Should Know

											
										
										
											2018-01-25 15:55:51 +06:00
+								* [Distributed Logging](https://blog.treasuredata.com/blog/2016/08/03/distributed-logging-architecture-in-the-container-era/)
-												The Problem with Logging - Jeff Atwood

											
										
										
											2018-02-12 21:59:50 +06:00
+									* [The Problem with Logging - Jeff Atwood](https://blog.codinghorror.com/the-problem-with-logging/)
-												Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann

											
										
										
											2018-02-10 17:14:17 +06:00
+									* [The Log: What Every Software Engineer Should Know](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying)
 									* [Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann](https://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/)
-												Add the section of Distributed Messaging

											
										
										
											2018-01-03 09:52:35 +06:00
+									* [Scalable and reliable log ingestion at Pinterest](https://medium.com/@Pinterest_Engineering/scalable-and-reliable-data-ingestion-at-pinterest-b921c2ee8754)
-												Refactor the Graph Databases section

											
										
										
											2018-01-20 18:30:24 +06:00
+									* [Building DistributedLog at Twitter: High-performance replicated log service](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-log-servic.html)
-												Split Distributed Tracing and Logging into two parts

											
										
										
											2018-01-20 18:53:17 +06:00
+									* [Logging Service with Spark at CERN Accelerator](https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html)
-												Logging and Aggregation at Quora

											
										
										
											2018-01-17 07:42:41 +06:00
+									* [Logging and Aggregation at Quora](https://engineering.quora.com/Logging-and-Aggregation-at-Quora)
-												BookKeeper: Distributed Log Storage at Yahoo

											
										
										
											2018-01-18 14:08:13 +06:00
+									* [BookKeeper: Distributed Log Storage at Yahoo](https://yahooeng.tumblr.com/post/109908973316/bookkeeper-yahoos-distributed-log-storage-is)
-												LogDevice: Distributed Data Store for Logs at Facebook

											
										
										
											2018-01-26 13:32:10 +06:00
+									* [LogDevice: Distributed Data Store for Logs at Facebook](https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/)
-												Add the section of Distributed Messaging

											
										
										
											2018-01-03 09:52:35 +06:00
+								* [Distributed Messaging](https://arxiv.org/pdf/1704.00411.pdf)
-												Kafka for PaaS at Rakuten

											
										
										
											2018-03-09 13:57:19 +06:00
+									* [When to use RabbitMQ or Kafka](https://content.pivotal.io/blog/understanding-when-to-use-rabbitmq-or-apache-kafka)
-												refactor

											
										
										
											2018-02-14 15:40:39 +06:00
+									* [Should You Put Several Event Types in the Same Kafka Topic? - Martin Kleppmann](https://www.confluent.io/blog/put-several-event-types-kafka-topic/)
-												Kafka for PaaS at Rakuten

											
										
										
											2018-03-09 13:57:19 +06:00
+									* [Kafka at Scale at Linkedin](https://engineering.linkedin.com/kafka/running-kafka-scale)
-												Yelp's Real-time Data Pipeline with Kafka

											
										
										
											2018-01-14 10:36:31 +06:00
+									* [Delaying Asynchronous Message Processing with RabbitMQ at Indeed](http://engineering.indeedblog.com/blog/2017/06/delaying-messages/)
-												refactor the Kafka part

											
										
										
											2018-01-20 18:45:17 +06:00
+									* [Real-time Data Pipeline with Kafka at Yelp](https://engineeringblog.yelp.com/2016/07/billions-of-messages-a-day-yelps-real-time-data-pipeline.html)
 									* [Audit Kafka End-to-End at Uber (count each message exactly once, audit a message across tiers)](https://eng.uber.com/chaperone/)
-												Kafka for PaaS at Rakuten

											
										
										
											2018-03-09 13:57:19 +06:00
+									* [Kafka for PaaS at Rakuten](https://techblog.rakuten.co.jp/2016/01/28/rakuten-paas-kafka/)
-												Create a branch for Deduplication Techniques

											
										
										
											2018-01-18 15:13:49 +06:00
+									* [Deduplication Techniques](https://en.wikipedia.org/wiki/Data_deduplication)
-												Exactly-once Semantics are Possible: Here’s How Kafka Does it

											
										
										
											2018-01-23 23:30:06 +06:00
+										* [Exactly-once Semantics are Possible: Here’s How Kafka Does it](https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/)
-												Create a branch for Deduplication Techniques

											
										
										
											2018-01-18 15:13:49 +06:00
+										* [Real-time Deduping at Scale with Kafka-based Pipleline at Tapjoy](http://eng.tapjoy.com/blog-list/real-time-deduping-at-scale)
-												refactor

											
										
										
											2018-02-14 15:40:39 +06:00
+										* [Delivering Billions of Messages Exactly Once: Deduping at Segment](https://segment.com/blog/exactly-once-delivery/)
-												Add a section for Distributed Searching

											
										
										
											2018-01-26 20:06:57 +06:00
+								* [Distributed Searching](http://nwds.cs.washington.edu/files/nwds/pdf/Distributed-WR.pdf)
 									* [Search Architecture of Instagram](https://engineering.instagram.com/search-architecture-eeb34a936d3a)
 									* [Search Architecture of eBay](http://www.cs.otago.ac.nz/homepages/andrew/papers/2017-8.pdf)
-												Elasticsearch Performance Tuning Practice at eBay

											
										
										
											2018-01-31 09:02:01 +06:00
+									* [Improving Search Engine Efficiency by over 25% at eBay](https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/)
 									* [Elasticsearch Performance Tuning Practice at eBay](https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/)
-												Add a section for Distributed Searching

											
										
										
											2018-01-26 20:06:57 +06:00
+									* [Nautilus: Travel Search Engine of Expedia](http://blog.expedia.com/expedias-nautilus-travel-search-engine-overview-and-applications/)
 									* [Galene: Search Architecture of LinkedIn](https://engineering.linkedin.com/search/did-you-mean-galene)
-												Search Service of Twitter (2014)

											
										
										
											2018-01-28 10:20:09 +06:00
+									* [Search at Slack](https://slack.engineering/search-at-slack-431f8c80619e)
-												Manas: High Performing Customized Search System at Pinterest

											
										
										
											2018-01-29 17:04:06 +06:00
+									* [Search Service (Half a Trillion Documents and Query Average Latency < 100ms) at Twitter (2014)](https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html)
-												Sherlock: Near Real Time Search Indexing at Flipkart

											
										
										
											2018-02-03 07:33:02 +06:00
+									* [Manas: High Performing Customized Search System at Pinterest](https://medium.com/@Pinterest_Engineering/manas-a-high-performing-customized-search-system-cf189f6ca40f)
 									* [Sherlock: Near Real Time Search Indexing at Flipkart](https://tech.flipkart.com/sherlock-near-real-time-search-indexing-95519783859d)
-												Nebula: Storage Platform to Build Search Backends at Airbnb

											
										
										
											2018-02-06 07:28:11 +06:00
+									* [Nebula: Storage Platform to Build Search Backends at Airbnb](https://medium.com/airbnb-engineering/nebula-as-a-storage-platform-to-build-airbnbs-search-backends-ecc577b05f06)
-												Elasticsearch at Kickstarter

											
										
										
											2018-02-07 15:11:09 +06:00
+									* [Elasticsearch at Kickstarter](https://kickstarter.engineering/elasticsearch-at-kickstarter-db3c487887fc)
-												Add a section for Distributed Searching

											
										
										
											2018-01-26 20:06:57 +06:00
+								* [Distributed Storage](http://highscalability.com/blog/2011/11/1/finding-the-right-data-solution-for-your-application-in-the.html)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+									* [In-memory Storage](https://medium.com/@denisanikin/what-an-in-memory-database-is-and-how-it-persists-data-efficiently-f43868cff4c1)
-												Introduction to In-memory Data - Viktor Gamov, Solutions Architect at Hazelcast

											
										
										
											2018-03-07 06:07:24 +06:00
+										* [Introduction to In-memory Data - Viktor Gamov, Solutions Architect at Hazelcast](https://www.infoq.com/presentations/in-memory-data)
-												Optimizing Memcached Efficiency at Quora

											
										
										
											2018-01-02 07:28:28 +06:00
+										* [Optimizing Memcached Efficiency at Quora](https://engineering.quora.com/Optimizing-Memcached-Efficiency)
-												Real-Time Data Warehouse with MemSQL on Cisco UCS

											
										
										
											2018-01-04 17:17:04 +06:00
+										* [Real-Time Data Warehouse with MemSQL on Cisco UCS](https://blogs.cisco.com/datacenter/memsql)
-												Fix the MemSQL at Tapjoy entry

											
										
										
											2018-01-24 08:55:31 +06:00
+										* [Moving to MemSQL at Tapjoy: Horizontally Scalable, ACID Compliant, MySQL Compatibility](http://eng.tapjoy.com/blog-list/moving-to-memsql)
-												refactor

											
										
										
											2018-02-14 15:40:39 +06:00
+									* [Durable Storage (Amazon S3)](http://www.datacenterknowledge.com/archives/2013/10/04/object-storage-the-future-of-scale-out)
 										* [Reasons for Choosing S3 over HDFS at Databricks](https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html)
 										* [S3 in the Data Infrastructure at Airbnb](https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c)
 										* [Quantcast File System on Amazon S3](https://www.quantcast.com/blog/quantcast-file-system-on-amazon-s3/)
 										* [Using S3 in Netflix Chukwa](https://medium.com/netflix-techblog/evolution-of-the-netflix-data-pipeline-da246ca36905)
-												Refactor the Object Storage part

											
										
										
											2018-01-20 18:56:21 +06:00
+										* [Yahoo Cloud Object Store - Object Storage at Exabyte Scale](https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at)
-												Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb

											
										
										
											2018-01-23 23:52:49 +06:00
+										* [Ambry: Distributed Immutable Object Store at LinkedIn](https://www.usenix.org/conference/srecon17americas/program/presentation/shenoy)
 										* [Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb](https://medium.com/airbnb-engineering/hammerspace-persistent-concurrent-off-heap-storage-3db39bb04472)
-												Move Distributed Version Control close to Distributed Storage

											
										
										
											2018-02-02 08:44:19 +06:00
+								* [Distributed Version Control](https://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/)
-												Distributed Version Control Systems: A Not-So-Quick Guide Through

											
										
										
											2018-02-22 12:34:50 +06:00
+									* [Distributed Version Control Systems: A Not-So-Quick Guide Through](https://www.infoq.com/articles/dvcs-guide)
-												Move Distributed Version Control close to Distributed Storage

											
										
										
											2018-02-02 08:44:19 +06:00
+									* [Distributed Git Server at Palantir](https://medium.com/@palantir/stemma-distributed-git-server-70afbca0fc29)
 									* [Git Configuration Files (Dotfiles) Distribution at Booking.com](https://blog.booking.com/dotfiles-distribution-at-booking.com.html)
-												Git Repo at Microsoft

											
										
										
											2018-02-23 17:42:45 +06:00
+									* [Configuration Management for Distributed Systems (using GitHub and cfg4j) at Flickr](https://code.flickr.net/2016/03/24/configuration-management-for-distributed-systems-using-github-and-cfg4j/)
 									* [Git Repo at Microsoft](https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								* [NoSQL](https://www.thoughtworks.com/insights/blog/nosql-databases-overview)
-												Manhattan: Twitter’s distributed key-value database

											
										
										
											2018-01-03 08:23:02 +06:00
+									* [Key-Value Databases (DynamoDB, Voldemort, Manhattan)](http://highscalability.com/anti-rdbms-list-distributed-key-value-stores)
-												Scaling Mapbox infrastructure with DynamoDB Streams

											
										
										
											2018-01-03 08:05:24 +06:00
+										* [Scaling Mapbox infrastructure with DynamoDB Streams](https://blog.mapbox.com/scaling-mapbox-infrastructure-with-dynamodb-streams-d53eabc5e972)
-												Manhattan: Twitter’s distributed key-value database

											
										
										
											2018-01-03 08:23:02 +06:00
+										* [Manhattan: Twitter’s distributed key-value database](https://blog.twitter.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale.html)
-												Sherpa: Yahoo’s distributed NoSQL key-value store

											
										
										
											2018-01-18 13:42:22 +06:00
+										* [Sherpa: Yahoo’s distributed NoSQL key-value store](https://yahooeng.tumblr.com/post/120730204806/sherpa-scales-new-heights)
-												Riak inside Chat Service Architecture at Riot Games

											
										
										
											2018-01-25 16:33:05 +06:00
+										* [Riak inside Chat Service Architecture at Riot Games](https://engineering.riotgames.com/news/chat-service-architecture-persistence)
-												MPH: Fast and Compact Immutable Key-Value Stores at Indeed

											
										
										
											2018-02-08 15:39:11 +06:00
+										* [MPH: Fast and Compact Immutable Key-Value Stores at Indeed](http://engineering.indeedblog.com/blog/2018/02/indeed-mph/)
-												zBase: High Performance, Elastic, Distributed Key-Value Store at Zynga

											
										
										
											2018-02-08 16:01:53 +06:00
+										* [zBase: High Performance, Elastic, Distributed Key-Value Store at Zynga](https://www.zynga.com/blogs/engineering/zbase-high-performance-elastic-distributed-key-value-store-2)
-												Fixed some errors in NoSQL section

											
										
										
											2018-01-17 08:08:07 +06:00
+									* [Column Databases (Cassandra, HBase, Vertica, Sybase IQ)](https://aws.amazon.com/nosql/columnar/)
-												Consistent Hashing in Cassandra

											
										
										
											2017-12-28 06:47:33 +06:00
+										* [Consistent Hashing in Cassandra](https://blog.imaginea.com/consistent-hashing-in-cassandra/)
-												When NOT to use Cassandra?

											
										
										
											2018-01-03 07:29:38 +06:00
+										* [When NOT to use Cassandra?](https://stackoverflow.com/questions/2634955/when-not-to-use-cassandra)
-												Scaling Lessons at Walmart Labs; Cassandra, Ooh na-na...

											
										
										
											2018-02-16 13:16:53 +06:00
+										* [Avoid Pitfalls in Scaling Cassandra Cluster: Lessons and Remedies at Walmart Labs](https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04)
-												Storing Images in Cassandra at Walmart Scale

											
										
										
											2017-12-28 06:49:55 +06:00
+										* [Storing Images in Cassandra at Walmart Scale](https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593)
-												Cassandra at Instagram

											
										
										
											2018-01-02 11:08:40 +06:00
+										* [Cassandra at Instagram](https://www.slideshare.net/DataStax/cassandra-at-instagram-2016)
-												How Yelp Scaled Ad Analytics with Cassandra

											
										
										
											2018-01-02 11:17:12 +06:00
+										* [How Yelp Scaled Ad Analytics with Cassandra](https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html)
-												How Discord Stores Billions of Messages with Cassandra

											
										
										
											2018-01-03 07:43:57 +06:00
+										* [How Discord Stores Billions of Messages with Cassandra](https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7)
-												Scale to serve 100+ million reads/writes using Spark and Cassandra at Dream11

											
										
										
											2018-02-07 18:11:33 +06:00
+										* [Scale to serve 100+ million reads/writes using Spark and Cassandra at Dream11](https://medium.com/dream11-tech-blog/leaderboard-dream11-4efc6f93c23e)
-												Imgur Notification: From MySQL to HBASE at Imgur

											
										
										
											2018-02-08 15:36:38 +06:00
+										* [Imgur Notification: From MySQL to HBASE at Imgur](https://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/)
-												Moving Food Feed from Redis to Cassandra at Zomato

											
										
										
											2018-02-08 16:04:31 +06:00
+										* [Moving Food Feed from Redis to Cassandra at Zomato](https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra)
-												Benchmarking Cassandra Scalability at Netflix; Half of my heart is in Cassandra Ooh Na Na...

											
										
										
											2018-02-16 18:45:03 +06:00
+										* [Benchmarking Cassandra Scalability on AWS at Netflix](https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e)
-												SimpleDB at Zendesk

											
										
										
											2018-02-04 08:52:04 +06:00
+									* [Document Databases (MongoDB, SimpleDB, CouchDB)](https://msdn.microsoft.com/en-us/magazine/hh547103.aspx)
-												eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB

											
										
										
											2018-01-03 07:39:57 +06:00
+										* [eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB](https://www.mongodb.com/blog/post/ebay-building-mission-critical-multi-data-center-applications-with-mongodb)
-												MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards

											
										
										
											2018-01-03 08:01:27 +06:00
+										* [MongoDB at Baidu: Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards](https://www.mongodb.com/blog/post/mongodb-at-baidu-powering-100-apps-across-600-nodes-at-pb-scale)
-												The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)

											
										
										
											2018-01-17 11:10:53 +06:00
+										* [The AWS and MongoDB Infrastructure of Parse (acquired by Facebook)](https://medium.baqend.com/parse-is-gone-a-few-secrets-about-their-infrastructure-91b3ab2fcf71)
-												Migrating Mountains of Mongo Data at Addepar

											
										
										
											2018-02-10 16:53:54 +06:00
+										* [Migrating Mountains of Mongo Data at Addepar](https://medium.com/build-addepar/migrating-mountains-of-mongo-data-63e530539952)
-												Couchbase Ecosystem at LinkedIn

											
										
										
											2018-01-18 15:31:26 +06:00
+										* [Couchbase Ecosystem at LinkedIn](https://engineering.linkedin.com/blog/2017/12/couchbase-ecosystem-at-linkedin)
-												SimpleDB at Zendesk

											
										
										
											2018-02-04 08:52:04 +06:00
+										* [SimpleDB at Zendesk](https://medium.com/zendesk-engineering/resurrecting-amazon-simpledb-9404034ec506)
-												Handling Billions of Edges in a Graph Database

											
										
										
											2018-03-06 12:21:40 +06:00
+									* [Graph Databases](https://www.ibm.com/developerworks/library/cl-graph-database-1/index.html)
 										* [Handling Billions of Edges in a Graph Database](https://www.infoq.com/presentations/graph-database-scalability)
-												Refactor the Graph Databases section

											
										
										
											2018-01-20 18:30:24 +06:00
+										* [Neo4j case studies with Walmart, eBay, AirBnB, NASA, etc](https://neo4j.com/customers/)
 										* [FlockDB: Distributed Graph Database for Storing Adjancency Lists at Twitter](https://blog.twitter.com/engineering/en_us/a/2010/introducing-flockdb.html)
-												Learn From Mistakes

											
										
										
											2018-01-24 00:38:47 +06:00
+										* [JanusGraph: Scalable Graph Database backed by Google, IBM and Hortonworks](https://architecht.io/google-ibm-back-new-open-source-graph-database-project-janusgraph-1d74fb78db6b)
-												Refactor the Graph Databases section

											
										
										
											2018-01-20 18:30:24 +06:00
+										* [Amazon Neptune](https://aws.amazon.com/neptune/)
-												Redis in Slack job queue

											
										
										
											2018-01-03 09:02:41 +06:00
+									* [Datastructure Databases (Redis, Hazelcast)](https://db-engines.com/en/system/Hazelcast%3BMemcached%3BRedis)
-												Enhance the Redis section

											
										
										
											2018-01-26 22:49:10 +06:00
+										* [Using Redis To Scale at Twitter](http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html)
 										* [Scaling Job Queue with Redis at Slack](https://slack.engineering/scaling-slacks-job-queue-687222e9d100)
-												Moving persistent data out of Redis at Github

											
										
										
											2018-01-03 09:13:42 +06:00
+										* [Moving persistent data out of Redis at Github](https://githubengineering.com/moving-persistent-data-out-of-redis/)
-												Enhance the Redis section

											
										
										
											2018-01-26 22:49:10 +06:00
+										* [Storing Hundreds of Millions of Simple Key-Value Pairs in Redis at Instagram](https://engineering.instagram.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c)
-												Redis in Chat Architecture of Twitch (from 27:22)

											
										
										
											2018-01-27 10:26:40 +06:00
+										* [Redis in Chat Architecture of Twitch (from 27:22)](https://www.infoq.com/presentations/twitch-pokemon)
-												Learn Redis the hard way (in production) at Trivago

											
										
										
											2018-02-01 11:33:40 +06:00
+										* [Learn Redis the hard way (in production) at Trivago](http://tech.trivago.com/2017/01/25/learn-redis-the-hard-way-in-production/)
-												Redis at Deliveroo

											
										
										
											2018-02-10 17:23:25 +06:00
+										* [Optimizing Session Key Storage in Redis at Deliveroo](https://deliveroo.engineering/2016/10/07/optimising-session-key-storage.html)
-												Refactor

											
										
										
											2018-02-14 15:29:46 +06:00
+										* [Optimizing Redis Storage at Deliveroo](https://deliveroo.engineering/2017/01/19/optimising-membership-queries.html)
-												MS SQL versus MySQL

											
										
										
											2018-01-20 20:30:41 +06:00
+								* [RDBMS (MySQL, MSSQL, PostgreSQL)](https://www.mysql.com/products/cluster/scalability.html)
 									* [MS SQL versus MySQL](https://www.upwork.com/hiring/data/sql-vs-mysql-which-relational-database-is-right-for-you/)
-												SQL Database Performance Tuning

											
										
										
											2018-03-05 07:10:58 +06:00
+									* [SQL Database Performance Tuning](https://www.toptal.com/sql-server/sql-database-tuning-for-developers)
-												Scaling Distributed Joins

											
										
										
											2018-02-27 22:48:16 +06:00
+									* [Scaling Distributed Joins](http://blog.memsql.com/scaling-distributed-joins/)
-												Why SQL is beating NoSQL, and what this means for the future of data

											
										
										
											2018-01-03 08:53:21 +06:00
+									* [Why SQL is beating NoSQL, and what this means for the future of data](https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a)
-												refactor

											
										
										
											2018-02-12 09:16:53 +06:00
+									* [MySQL Crash-Safe Replication, Parallel Replication, and Slave Scaling (10 parts) at Booking.com](https://blog.booking.com/author/jean-francois-gagne.html)
-												Autoscaling Pub/Sub Consumers at Spotify

											
										
										
											2018-01-03 09:27:40 +06:00
+									* [Sharding MySQL at Pinterest](https://medium.com/@Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f)
-												Sharding MySQL at MailChimp

											
										
										
											2018-02-26 08:35:13 +06:00
+									* [Sharding MySQL at MailChimp](https://devs.mailchimp.com/blog/using-shards-to-accommodate-millions-of-users/)
-												Autoscaling Pub/Sub Consumers at Spotify

											
										
										
											2018-01-03 09:27:40 +06:00
+									* [How Airbnb Partitioned Main MySQL Database in Two Weeks](https://medium.com/airbnb-engineering/how-we-partitioned-airbnb-s-main-database-in-two-weeks-55f7e006ff21)
-												Minor changes in titles

											
										
										
											2018-01-03 08:14:06 +06:00
+									* [Replication is the Key for Scalability & High Availability](http://basho.com/posts/technical/replication-is-the-key-for-scalability-high-availability/)
-												How Twitch uses PostgreSQL

											
										
										
											2018-01-03 08:11:04 +06:00
+									* [How Twitch uses PostgreSQL](https://blog.twitch.tv/how-twitch-uses-postgresql-c34aa9e56f58)
-												Scaling MySQL-based financial reporting system at Airbnb

											
										
										
											2018-01-03 08:40:17 +06:00
+									* [Scaling MySQL-based financial reporting system at Airbnb](https://medium.com/airbnb-engineering/tracking-the-money-scaling-financial-reporting-at-airbnb-6d742b80f040)
-												Scaling to 100M at Wix: MySQL is a Better NoSQL

											
										
										
											2018-01-03 08:44:27 +06:00
+									* [Scaling to 100M at Wix: MySQL is a Better NoSQL](https://www.wix.engineering/single-post/scaling-to-100m-mysql-is-a-better-nosql)
-												Why Uber Engineering Switched from Postgres to MySQL

											
										
										
											2018-01-03 08:50:01 +06:00
+									* [Why Uber Engineering Switched from Postgres to MySQL](https://eng.uber.com/mysql-migration/)
-												Handling Growth with Postgres at Instagram

											
										
										
											2018-01-09 08:26:22 +06:00
+									* [Handling Growth with Postgres at Instagram](https://engineering.instagram.com/handling-growth-with-postgres-5-tips-from-instagram-d5d7e7ffdfcb)
-												Scaling the Analytics Database (Postgres) at TransferWise

											
										
										
											2018-01-29 13:55:49 +06:00
+									* [Scaling the Analytics Database (Postgres) at TransferWise](http://tech.transferwise.com/scaling-our-analytics-database/)
-												refactor

											
										
										
											2018-02-12 09:16:53 +06:00
+									* [MySQL Sharding (3 parts) at Evernote](https://blog.evernote.com/tech/2015/10/08/the-great-shard-migration-part-ii/)
-												Add the section of Time Series Database (TSDB)

											
										
										
											2018-01-18 14:37:46 +06:00
+								* [Time Series Database (TSDB)](https://www.influxdata.com/time-series-database/)
-												Increasing Application Performance with HTTP Cache Headers

											
										
										
											2018-01-24 09:02:00 +06:00
+									* [Time Series Data: Why and How to Use a Relational Database instead of NoSQL](https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c)
-												Time Series Data: Why and How to Use a Relational Database instead of NoSQL - by Mike Freedman, Professor of Computer Science, Princeton University

											
										
										
											2018-01-24 08:46:01 +06:00
+									* [Beringei: High-performance Time Series Storage Engine at Facebook](https://code.facebook.com/posts/952820474848503/beringei-a-high-performance-time-series-storage-engine/)
 									* [Atlas: In-memory Dimensional Time Series Database at Netflix](https://medium.com/netflix-techblog/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a)
 									* [Heroic: Time Series Database at Spotify](https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/)
-												Roshi - Distributed Storage System for Time-Series Event at SoundCloud

											
										
										
											2018-01-30 17:38:20 +06:00
+									* [Roshi: Distributed Storage System for Time-Series Event at SoundCloud](https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-timestamped-events)
-												Time Series Data: Why and How to Use a Relational Database instead of NoSQL - by Mike Freedman, Professor of Computer Science, Princeton University

											
										
										
											2018-01-24 08:46:01 +06:00
+									* [Building a Scalable Time Series Database on PostgreSQL](https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2)
-												Scaling Time Series Data Storage at Netflix

											
										
										
											2018-01-26 22:36:04 +06:00
+									* [Scaling Time Series Data Storage at Netflix](https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-i-ec2b6d44ba39)
-												Stop worrying and love the proxy

											
										
										
											2018-01-22 08:21:17 +06:00
+								* [HTTP Caching (Reverse Proxy, CDN)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+									* [Reverse Proxy (Nginx, Varnish, Squid, rack-cache)](https://www.mertech.com/overview-reverse-proxying/)
-												Increasing Application Performance with HTTP Cache Headers

											
										
										
											2018-01-24 09:02:00 +06:00
+									* [Stop Worrying and Love the Proxy](https://blog.turbinelabs.io/how-we-learned-to-stop-worrying-and-love-the-proxy-89af98fabaf8)
-												Playing HTTP Tricks with Nginx

											
										
										
											2018-01-22 08:31:56 +06:00
+									* [Playing HTTP Tricks with Nginx](https://www.elastic.co/blog/playing-http-tricks-nginx)
-												Increasing Application Performance with HTTP Cache Headers

											
										
										
											2018-01-24 09:02:00 +06:00
+									* [Using CDN to Improve Site Performance at Coursera](https://building.coursera.org/blog/2015/07/09/improving-coursera-global-site-performance-a-head-to-head-cdn-battle-with-production-traffic/)
-												Strategy: Caching 404s Saved 66% On Server Time at The Onion

											
										
										
											2018-01-22 08:07:24 +06:00
+									* [Strategy: Caching 404s Saved 66% On Server Time at The Onion](http://highscalability.com/blog/2010/3/26/strategy-caching-404s-saved-the-onion-66-on-server-time.html)
-												Increasing Application Performance with HTTP Cache Headers

											
										
										
											2018-01-24 09:02:00 +06:00
+									* [Increasing Application Performance with HTTP Cache Headers](https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers)
-												Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga

											
										
										
											2018-02-08 15:58:22 +06:00
+									* [Zynga Geo Proxy: Reducing Mobile Game Latency at Zynga](https://www.zynga.com/blogs/engineering/zynga-geo-proxy-reducing-mobile-game-latency)
-												Google AMP at Condé Nast

											
										
										
											2018-02-09 15:58:01 +06:00
+									* [Google AMP at Condé Nast](https://technology.condenast.com/story/the-why-and-how-of-google-amp-at-conde-nast)
-												Running A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo

											
										
										
											2018-02-10 17:28:29 +06:00
+									* [Running A/B Tests on Hosting Infrastructure (CDNs) at Deliveroo](https://deliveroo.engineering/2016/09/19/ab-testing-cdns.html)
-												HAProxy with Kubernetes for User-facing Traffic at SoundCloud

											
										
										
											2018-02-13 18:15:34 +06:00
+									* [HAProxy with Kubernetes for User-facing Traffic at SoundCloud](https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic)
-												Rearrange the sections: move HTTP Caching near Load Balancing and Concurrency near Parallel, look better!

											
										
										
											2018-01-27 00:19:46 +06:00
+								* [Load Balancing](https://blog.vivekpanyam.com/scaling-a-web-service-load-balancing/)
 									* [Introduction to Modern Network Load Balancing and Proxying](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236)
 									* [Load Balancing infrastructure to support more than 1.3 billion users at Facebook](https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)
 									* [DHCPLB: Open Source Load Balancer for DHCP at Facebook](https://code.facebook.com/posts/1734309626831603/dhcplb-an-open-source-load-balancer/)
 									* [Load Balancing with Eureka at Netflix](https://medium.com/netflix-techblog/netflix-shares-cloud-load-balancing-and-failover-tool-eureka-c10647ef95e5)
 									* [Load Balancing at Yelp](https://engineeringblog.yelp.com/2017/05/taking-zero-downtime-load-balancing-even-further.html)
 									* [Load Balancing at Github](https://githubengineering.com/introducing-glb/)
 									* [Consistent Hashing to Improve Load Balancing at Vimeo](https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed)
 									* [UDP Load Balancing at 500 pixel](https://developers.500px.com/udp-load-balancing-with-keepalived-167382d7ad08)
-												Refactor

											
										
										
											2018-02-02 08:41:20 +06:00
+								* [Autoscaling](https://medium.com/@BotmetricHQ/top-11-hard-won-lessons-learned-about-aws-auto-scaling-5bfe56da755f)
-												A Horror Movie Featuring Auto Scaling Groups, EBS Volumes, Terraform, and Bash

											
										
										
											2018-02-20 12:22:30 +06:00
+									* [A Horror Movie Featuring Auto Scaling Groups, EBS Volumes, Terraform, and Bash](https://blog.gruntwork.io/yak-shaving-series-1-all-i-need-is-a-little-bit-of-disk-space-6e5ef1644f67)
-												Autoscaling Pinterest

											
										
										
											2018-02-03 07:27:11 +06:00
+									* [Autoscaling Pinterest](https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64)
-												Refactor

											
										
										
											2018-02-02 08:41:20 +06:00
+									* [Autoscaling Based on Request Queuing at Square](https://medium.com/square-corner-blog/autoscaling-based-on-request-queuing-c4c0f57f860f)
 									* [Autoscaling Applications at PayPal](https://www.paypal-engineering.com/2017/08/16/autoscaling-applications-paypal/)
 									* [Autoscaling Jenkins at Trivago](http://tech.trivago.com/2017/02/17/your-definite-guide-for-autoscaling-jenkins/)
-												Scryer: Predictive Auto Scaling Engine at Netflix

											
										
										
											2018-02-03 07:30:38 +06:00
+									* [Scryer: Predictive Auto Scaling Engine at Netflix](https://medium.com/netflix-techblog/scryer-netflixs-predictive-auto-scaling-engine-a3f8fc922270)
-												Replace the heading article of Concurrency by the post of Joe Duffy (Founder of the Parallel Extensions to the .NET Framework team at MS && MS Midori)

											
										
										
											2018-01-25 15:32:13 +06:00
+								* [Concurrency](http://joeduffyblog.com/2016/11/30/15-years-of-concurrency/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+									* [Message-Passing Concurrency](https://link.springer.com/chapter/10.1007/978-3-642-35170-9_11)
 									* [Software Transactional Memory](https://dl.acm.org/citation.cfm?id=3037750)
 									* [Dataflow Concurrency](http://www.marketwired.com/press-release/java-concurrency-and-scalability-platform-akka-celebrates-fifth-anniversary-1928674.htm)
 									* [Shared-State Concurrency](https://common-lisp.net/project/ssc/darcs/spec/specification.pdf)
-												Concurrency series by Larry Osterman (Principal SDE at Microsoft)

											
										
										
											2018-01-20 09:25:42 +06:00
+									* [Concurrency series by Larry Osterman (Principal SDE at Microsoft)](https://social.msdn.microsoft.com/Profile/Larry%2bOsterman%2b%5BMSFT%5D/activity)
 										* [Part 8 – Concurrency for scalability](https://blogs.msdn.microsoft.com/larryosterman/2005/02/28/concurrency-part-8-concurrency-for-scalability/)
 										* [Part 9 - APIs that enable scalable programming](https://blogs.msdn.microsoft.com/larryosterman/2005/03/02/concurrency-part-9-apis-that-enable-scalable-programming/)
 										* [Part 10 - How do you know if you’ve got a scalability issue?](https://blogs.msdn.microsoft.com/larryosterman/2005/03/03/concurrency-part-10-how-do-you-know-if-youve-got-a-scalability-issue/)
 										* [Part 11 – Hidden scalability issues](https://blogs.msdn.microsoft.com/larryosterman/2005/03/04/concurrency-part-11-hidden-scalability-issues/)
 										* [Part 12 – Hidden scalability issues (cont)](https://blogs.msdn.microsoft.com/larryosterman/2005/03/07/concurrency-part-12-hidden-scalability-issues-part-2/)
-												Fix a heading bullet error

											
										
										
											2018-01-25 16:34:20 +06:00
+									* [Concurrency with Erlang](http://learnyousomeerlang.com/the-hitchhikers-guide-to-concurrency)
-												Concurrency with Erlang

											
										
										
											2018-01-25 16:23:46 +06:00
+										* [Erlang in WhatsApp](https://blog.whatsapp.com/196/1-million-is-so-2011)
 										* [Erlang in Riot Chat Server](https://engineering.riotgames.com/news/chat-service-architecture-servers)
 										* [How Discord Scaled Elixir to Five Millions Concurrent Users](https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b)
-												Mnesia and CAP

											
										
										
											2018-01-28 09:46:27 +06:00
+										* [Mnesia: A Distributed DBMS Rooted in Concurrency](https://www.developer.com/db/article.php/3864331/Mnesia-A-Distributed-DBMS-Rooted-in-Concurrency.htm)
 										* [Mesia and CAP](https://medium.com/@jlouis666/mnesia-and-cap-d2673a92850)
-												Running Concurrent Queries in GoSocial (Go and Neo4j) at Medium

											
										
										
											2018-02-07 18:13:27 +06:00
+									* [Running Concurrent Queries in GoSocial (Go and Neo4j) at Medium](https://medium.engineering/running-concurrent-queries-in-gosocial-28e5841b05b5)
-												The Secret To 10 Million Concurrent Connections

											
										
										
											2018-03-04 08:41:37 +06:00
+									* [The Secret To 10 Million Concurrent Connections](http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html)
-												Rearrange the sections: move HTTP Caching near Load Balancing and Concurrency near Parallel, look better!

											
										
										
											2018-01-27 00:19:46 +06:00
+								* [Parallel Computing](https://blogs.msdn.microsoft.com/ddperf/2009/05/02/are-we-taking-advantage-of-parallelism/)
 									* [SPMD (Single Program Multiple Data): The Genetic Pattern](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-186.html)
 									* [Master/Worker Pattern](https://docs.gigaspaces.com/sbp/master-worker-pattern.html)
 									* [Loop Parallelism Pattern: Extracting parallel tasks from loops](https://www.cs.umd.edu/class/fall2001/cmsc411/projects/unroll/main.htm)
 									* [Fork/Join Pattern: Good for recursive data processing](http://highscalability.com/learn-how-exploit-multiple-cores-better-performance-and-scalability)
 									* [Map-Reduce: Born for Simplified Data Processing on Large Clusters](http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf)
 									* [On the Death of Map-Reduce - Henry Robinson, Cloudera](http://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/)
-												Edit the title: Server-side Optimization to Parallelize the Rendering of Web Pages at Yelp

											
										
										
											2018-01-27 01:00:38 +06:00
+									* [Server-side Optimization to Parallelize the Rendering of Web Pages at Yelp](https://engineeringblog.yelp.com/2017/07/generating-web-pages-in-parallel-with-pagelets.html)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								* [Event-Driven Architecture](https://martinfowler.com/articles/201701-event-driven.html)
-												Stream Processing, Event Sourcing, Reactive, CEP, etc and Making sense of it all - Martin Kleppmann

											
										
										
											2018-02-10 17:05:22 +06:00
+									* [Stream Processing, Event Sourcing, Reactive, CEP, etc and Making sense of it all - Martin Kleppmann](https://www.confluent.io/blog/making-sense-of-stream-processing/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+									* [Messaging](https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/cjt1004_.html)
 										* [Publish-Subscribe](https://aws.amazon.com/pub-sub-messaging/)
-												Wormhole: Pub-Sub system at Facebook (2013)

											
										
										
											2018-01-18 14:25:24 +06:00
+											* [Autoscaling Pub-Sub Consumers at Spotify](https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/)
 											* [Pulsar: Pub-Sub Messaging at Scale at Yahoo](https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale)
 											* [Wormhole: Pub-Sub system at Facebook (2013)](https://code.facebook.com/posts/188966771280871/wormhole-pub-sub-system-moving-data-through-space-and-time/)
-												Pub-Sub in Chatting Architecture on LINE LIVE

											
										
										
											2018-01-25 19:39:05 +06:00
+											* [Pub-Sub in Chatting Architecture on LINE LIVE](https://engineering.linecorp.com/en/blog/detail/85)
-												Enhance the Event-Driven Architecture section

											
										
										
											2018-01-27 00:02:57 +06:00
+										* [Point-To-Point and Its Differences from Pub-Sub](https://www.journaldev.com/9743/jms-messaging-models)
 										* [Store-Forward](https://docs.oracle.com/cd/E13222_01/wls/docs91/saf_admin/overview.html)
 										* [Request-Reply](https://docs.tibco.com/pub/ftl/4.3.0/doc/html/GUID-A64ABED1-682E-4E1D-A94A-5590CB91B9BB.html)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+									* [Enterprise Service Bus](http://www.oracle.com/technetwork/articles/soa/ind-soa-esb-1967705.html)
-												Correct the link of Domain Event

											
										
										
											2018-01-27 00:09:17 +06:00
+									* [Domain Events](https://martinfowler.com/eaaDev/DomainEvent.html)
-												Domain Events: Simple and Reliable Solution

											
										
										
											2018-01-27 00:16:09 +06:00
+										* [Domain Events: Simple and Reliable Solution](http://enterprisecraftsmanship.com/2017/10/03/domain-events-simple-and-reliable-solution/)
-												Replace link for the topic Event Stream Processing

											
										
										
											2018-01-17 08:47:34 +06:00
+									* [Event Stream Processing](https://www.sas.com/en_us/insights/articles/big-data/3-things-about-event-stream-processing.html)
-												Kafka Streams on Heroku

											
										
										
											2018-01-17 08:44:12 +06:00
+										* [Kafka Streams on Heroku](https://blog.heroku.com/kafka-streams-on-heroku)
-												Kafka in Platform Events Architecture at Salesforce

											
										
										
											2018-02-07 18:17:55 +06:00
+										* [Kafka in Platform Events Architecture at Salesforce](https://engineering.salesforce.com/how-apache-kafka-inspired-our-platform-events-architecture-2f351fe4cf63)
-												Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo

											
										
										
											2018-01-18 13:22:28 +06:00
+										* [Bullet: Forward-Looking Query Engine for Streaming Data at Yahoo](https://yahooeng.tumblr.com/post/161855616651/open-sourcing-bullet-yahoos-forward-looking)
-												Benchmarking Streaming Computation Engines at Yahoo

											
										
										
											2018-01-18 13:39:24 +06:00
+										* [Benchmarking Streaming Computation Engines at Yahoo](https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at)
-												Add entries for the section of Event Sourcing

											
										
										
											2018-01-25 14:22:14 +06:00
+									* [Event Sourcing](https://martinfowler.com/eaaDev/EventSourcing.html)
 										* [Event Sourced Architectures for High Availability](https://www.infoq.com/presentations/Event-Sourced-Architectures-for-High-Availability)
 										* [Event Sourcing and Stream Processing at Scale](https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-processing-at-ddd-europe.html)
 										* [Scaling Event Sourcing for Netflix Downloads](https://www.infoq.com/presentations/netflix-scale-event-sourcing)
 										* [Scaling Event-Sourcing at Jet.com](https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c873cac33b8)
-												Building Scalable Applications Using Event Sourcing and CQRS using Kafka

											
										
										
											2018-02-02 08:18:15 +06:00
+									* [Command & Query Responsibility Segregation (CQRS)](https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs)
-												Exploring CQRS and Event Sourcing - MSDN (with free ebook)

											
										
										
											2018-03-09 13:46:58 +06:00
+										* [Exploring CQRS and Event Sourcing - MSDN (with free ebook)](https://msdn.microsoft.com/en-us/library/jj554200.aspx)
-												Simone: Distributed Simulation Service at Netflix

											
										
										
											2018-02-05 13:27:39 +06:00
+										* [CQRS Simple Architecture](https://www.future-processing.pl/blog/cqrs-simple-architecture/)
-												Building Scalable Applications Using Event Sourcing and CQRS with Kafka

											
										
										
											2018-02-02 08:19:04 +06:00
+										* [Building Scalable Applications Using Event Sourcing and CQRS with Kafka](https://initiate.andela.com/event-sourcing-and-cqrs-a-look-at-kafka-e0c1b90d17d8)
-												Add Distributed Machine Learning section

											
										
										
											2018-01-17 08:34:54 +06:00
+								* [Distributed Machine Learning](https://arxiv.org/pdf/1512.09295.pdf)
 									* [Scalable Deep Learning Platform On Spark In Baidu](https://www.slideshare.net/JenAman/scalable-deep-learning-platform-on-spark-in-baidu)
 									* [Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow](https://eng.uber.com/horovod/)
-												Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp

											
										
										
											2018-01-17 14:13:33 +06:00
+									* [Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp](https://engineeringblog.yelp.com/2018/01/building-a-distributed-ml-pipeline-part1.html)
-												CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo

											
										
										
											2018-01-18 13:36:48 +06:00
+									* [TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo](https://yahooeng.tumblr.com/post/157196488076/open-sourcing-tensorflowonspark-distributed-deep)
 									* [CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo](https://yahooeng.tumblr.com/post/139916828451/caffeonspark-open-sourced-for-distributed-deep)
-												AIOps in Practice at Baidu

											
										
										
											2018-01-22 10:53:52 +06:00
+									* [AIOps in Practice at Baidu](https://www.usenix.org/conference/srecon17asia/program/presentation/qu)
-												Learning with Privacy at Scale - Differential Privacy Team, Apple

											
										
										
											2018-01-24 11:54:33 +06:00
+									* [Learning with Privacy at Scale - Differential Privacy Team, Apple](https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html)
-												Image Classification Experiment Using Deep Learning at Mercari

											
										
										
											2018-02-04 08:59:55 +06:00
+									* [Image Classification Experiment Using Deep Learning at Mercari](https://medium.com/mercari-engineering/mercaris-image-classification-experiment-using-deep-learning-9b4e994a18ec)
-												Content-based Video Relevance Prediction at Hulu

											
										
										
											2018-02-07 18:24:15 +06:00
+									* [Content-based Video Relevance Prediction at Hulu](https://medium.com/hulu-tech-blog/content-based-video-relevance-prediction-b2c448e14752)
-												PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes at Baidu

											
										
										
											2018-02-07 18:30:49 +06:00
+									* [PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes at Baidu](http://research.baidu.com/paddlepaddle-fluid-elastic-deep-learning-kubernetes/)
-												Training ML Models with Airflow and BigQuery at WePay

											
										
										
											2018-02-08 16:09:06 +06:00
+									* [Training ML Models with Airflow and BigQuery at WePay](https://wecode.wepay.com/posts/training-machine-learning-models-with-airflow-and-bigquery)
-												Improving Photo Selection With Deep Learning at TripAdvisor

											
										
										
											2018-02-08 16:12:16 +06:00
+									* [Improving Photo Selection With Deep Learning at TripAdvisor](http://engineering.tripadvisor.com/improving-tripadvisor-photo-selection-deep-learning/)
-												Machine Learning (2 parts) at Condé Nast

											
										
										
											2018-02-09 15:53:07 +06:00
+									* [Machine Learning (2 parts) at Condé Nast](https://technology.condenast.com/story/handbag-brand-and-color-detection)
-												Venue Rating System at Foursquare

											
										
										
											2018-02-21 12:24:01 +06:00
+									* [Machine Learning Applications In The E-commerce Domain (4 parts) at Rakuten](https://techblog.rakuten.co.jp/2017/07/12/machine-learning-applications-in-the-e-commerce-domain-4/)
 									* [Venue Rating System at Foursquare](https://engineering.foursquare.com/finding-the-perfect-10-how-we-developed-the-foursquare-venue-rating-system-c76b08f7b9b3)
-												Refactor

											
										
										
											2018-02-02 08:41:20 +06:00
+								* [Distributed Architecture in Financial Systems](https://medium.com/@sofie_4036/lets-build-a-bank-service-architecture-410dca881291)
 									* [Building a Modern Bank Backend at Monzo](https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/)
 									* [Choosing an Architecture for Core Banking System at TrustBK](https://blog.trustbk.com/choosing-an-architecture-85750e1e5a03)
-												Tech Stack at TransferWise

											
										
										
											2018-02-20 20:49:17 +06:00
+									* [Reinventing the Trading Platform for Scale at Wealthsimple](https://medium.com/@Wealthsimple/engineering-at-wealthsimple-reinventing-our-trading-platform-for-scale-17e332241b6c)
 									* [Tech Stack at TransferWise](http://tech.transferwise.com/the-transferwise-stack-heartbeat-of-our-little-revolution/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								## Availability
-												Change the Failover introduction link to a better one

											
										
										
											2018-01-26 22:32:33 +06:00
+								* [Failover](http://cloudpatterns.org/mechanisms/failover_system)
-												The Evolution of Global Traffic Routing and Failover

											
										
										
											2018-01-22 10:44:44 +06:00
+									* [The Evolution of Global Traffic Routing and Failover](https://www.usenix.org/conference/srecon16/program/presentation/heady)
-												Testing for Disaster Recovery Failover Testing

											
										
										
											2018-01-22 10:48:00 +06:00
+									* [Testing for Disaster Recovery Failover Testing](https://www.usenix.org/conference/srecon17asia/program/presentation/liu_zehua)
-												Designing a Microservices Architecture for Failure

											
										
										
											2018-01-30 17:45:04 +06:00
+									* [Designing a Microservices Architecture for Failure](https://blog.risingstack.com/designing-microservices-architecture-for-failure/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								* [Replication](https://m.alphasights.com/a-primer-on-database-replication-381b319cd032)
 									* [Master-Slave](https://engineering.bitnami.com/articles/enabling-additional-nodes-to-bitnami-mysql-with-replication.html)
 									* [Tree Replication](https://link.springer.com/chapter/10.1007/3-540-44863-2_47)
 									* [Master-Master](http://sabbour.me/highly-available-and-scalable-master-master-mysql-on-azure-virtual-machines/)
 									* [Buddy Replication](https://developer.jboss.org/wiki/JBossCacheBuddyReplicationDesign)
-												NodeJS High Availability at Yahoo

											
										
										
											2018-01-18 14:12:40 +06:00
+								* [NodeJS High Availability at Yahoo](https://yahooeng.tumblr.com/post/68823943185/nodejs-high-availability)
-												refactor

											
										
										
											2018-02-12 09:16:53 +06:00
+								* [Every Day Is Monday in Operations (11 parts) at LinkedIn ](https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason)
-												Practical Guide to Monitoring and Alerting with Time Series at Scale

											
										
										
											2018-01-22 09:54:09 +06:00
+								* [Practical Guide to Monitoring and Alerting with Time Series at Scale](https://www.usenix.org/conference/srecon17americas/program/presentation/wilkinson)
-												How Robust Monitoring Powers High Availability for LinkedIn Feed

											
										
										
											2018-01-22 09:55:28 +06:00
+								* [How Robust Monitoring Powers High Availability for LinkedIn Feed](https://www.usenix.org/conference/srecon17americas/program/presentation/barot)
-												Architectural Patterns for High Availability - Adrian Cockcroft, Director of Architecture at Netflix

											
										
										
											2018-01-25 11:37:22 +06:00
+								* [Architectural Patterns for High Availability - Adrian Cockcroft, Director of Architecture at Netflix](https://www.infoq.com/presentations/Netflix-Architecture)
-												Ensuring Resilience to Disaster at Quora

											
										
										
											2018-01-29 15:16:13 +06:00
+								* [Ensuring Resilience to Disaster at Quora](https://engineering.quora.com/Ensuring-Quoras-Resilience-to-Disaster)
-												Resiliency against Traffic Oversaturation at iHeartRadio

											
										
										
											2018-02-07 18:06:48 +06:00
+								* [Resiliency against Traffic Oversaturation at iHeartRadio](https://tech.iheart.com/resiliency-against-traffic-oversaturation-77c5ed92a5fb)
-												How Production Engineers Support Global Events at Facebook

											
										
										
											2018-02-22 16:13:44 +06:00
+								* [How Production Engineers Support Global Events at Facebook](https://code.facebook.com/posts/166966743929963/how-production-engineers-support-global-events-on-facebook/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
 								## Stability
-												Change heading links and add entries for Circuit Breaker

											
										
										
											2018-01-25 19:51:21 +06:00
+								* [Circuit Breaker](https://martinfowler.com/bliki/CircuitBreaker.html)
 									* [Circuit Breaking in Distributed Systems](https://www.infoq.com/presentations/circuit-breaking-distributed-systems)
-												Circuit Breakers for Distributed Services at LINE

											
										
										
											2018-01-25 19:56:08 +06:00
+									* [Circuit Breakers for Distributed Services at LINE](https://engineering.linecorp.com/en/blog/detail/76)
-												Change heading links and add entries for Circuit Breaker

											
										
										
											2018-01-25 19:51:21 +06:00
+									* [Applying Circuit Breaker to Channel Gateway at LINE](https://engineering.linecorp.com/en/blog/detail/78)
-												Lessons in Resilience at SoundCloud

											
										
										
											2018-01-30 17:32:39 +06:00
+									* [Lessons in Resilience at SoundCloud](https://developers.soundcloud.com/blog/lessons-in-resilience-at-SoundCloud)
-												Change heading links and add entries for Circuit Breaker

											
										
										
											2018-01-25 19:51:21 +06:00
+									* [Circuit Breaker for Scaling Containers](https://f5.com/about-us/blog/articles/the-art-of-scaling-containers-circuit-breakers-28919)
-												Protector: Circuit Breaker for Time Series Databases at Trivago

											
										
										
											2018-02-01 11:41:13 +06:00
+									* [Protector: Circuit Breaker for Time Series Databases at Trivago](http://tech.trivago.com/2016/02/23/protector/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								* [Always use timeouts (if possible)](https://www.javaworld.com/article/2824163/application-performance/stability-patterns-applied-in-a-restful-architecture.html)
 								* [Let it crash/Supervisors: Embrace failure as a natural state in the life-cycle of the application](http://erlang.org/doc/design_principles/sup_princ.html)
 								* [Crash early: An error now is better than a response tomorrow](http://odino.org/better-performance-the-case-for-timeouts/)
 								* [Bulkheads: Partition and tolerate failure in one part](https://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html)
 								* [Steady state: Always put logs on separate disk](https://docs.microsoft.com/en-us/sql/relational-databases/policy-based-management/place-data-and-log-files-on-separate-drives)
 								* [Throttling: Maintain a steady pace](http://www.sosp.org/2001/papers/welsh.pdf)
-												Multi-clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn

											
										
										
											2018-01-13 19:10:49 +06:00
+								* [Multi-clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn](https://engineering.linkedin.com/blog/2017/11/improving-resiliency-and-stability-of-a-large-scale-api)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
-												Add a section for Performance

											
										
										
											2018-01-26 18:05:29 +06:00
+								## Performance
 								* [Web Performance: Cache Efficiency Exercise at Facebook](https://code.facebook.com/posts/964122680272229/web-performance-cache-efficiency-exercise/)
 								* [Improving Performance with Background Data Prefetching at Instagram](https://engineering.instagram.com/improving-performance-with-background-data-prefetching-b191acb39898)
 								* [Compression Techniques to Solve Network I/O Bottlenecks at eBay](https://www.ebayinc.com/stories/blogs/tech/how-ebays-shopping-cart-used-compression-techniques-to-solve-network-io-bottlenecks/)
 								* [Optimizing Web Servers for High Throughput and Low Latency at Dropbox](https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/)
 								* [Boosting Site Speed Using Brotli Compression at LinkedIn](https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression)
 								* [Linux Performance Analysis in 60.000 Milliseconds at Netflix](https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55)
-												Optimizing 360 photos at scale at Facebook

											
										
										
											2018-01-27 16:28:55 +06:00
+								* [Optimizing 360 Photos at Scale at Facebook](https://code.facebook.com/posts/129055711052260/optimizing-360-photos-at-scale/)
-												Reducing Image File Size in the Photos Infrastructure at Etsy

											
										
										
											2018-02-11 08:11:15 +06:00
+								* [Reducing Image File Size in the Photos Infrastructure at Etsy](https://codeascraft.com/2017/05/30/reducing-image-file-size-at-etsy/)
-												Improving Video Thumbnails with Deep Neural Nets at YouTube

											
										
										
											2018-01-31 13:06:13 +06:00
+								* [Improving Video Thumbnails with Deep Neural Nets at YouTube](https://youtube-eng.googleblog.com/2015/10/improving-youtube-video-thumbnails-with_8.html)
-												Optimizing APIs through Dynamic Polyglot Runtime, Fully Asynchronous, and Reactive Programming at Netflix

											
										
										
											2018-01-29 15:27:41 +06:00
+								* [Optimizing APIs through Dynamic Polyglot Runtime, Fully Asynchronous, and Reactive Programming at Netflix](https://medium.com/netflix-techblog/optimizing-the-netflix-api-5c9ac715cf19)
-												Manas: High Performing Customized Search System at Pinterest

											
										
										
											2018-01-29 17:04:06 +06:00
+								* [Optimizing Video Playback Performance at Pinterest](https://medium.com/@Pinterest_Engineering/optimizing-video-playback-performance-caf55ce310d1)
-												Reducing Video Loading Time by Prefetching during Preroll at Dailymotion

											
										
										
											2018-01-31 13:04:12 +06:00
+								* [Reducing Video Loading Time by Prefetching during Preroll at Dailymotion](http://engineering.dailymotion.com/reducing-video-loading-time-prefetching-video-during-preroll/)
-												Performance Improvements (All Stacks) at Pinterest

											
										
										
											2018-01-29 17:06:29 +06:00
+								* [Improving GIF Performance at Pinterest](https://medium.com/@Pinterest_Engineering/improving-gif-performance-on-pinterest-8dad74bf92f1)
 								* [Performance Improvements (All Stacks) at Pinterest](https://medium.com/@Pinterest_Engineering/driving-user-growth-with-performance-improvements-cfc50dafadd7)
-												Server Side Rendering at Wix

											
										
										
											2018-02-01 11:37:26 +06:00
+								* [Server Side Rendering at Wix](https://www.youtube.com/watch?v=f9xI2jR71Ms)
-x Performance Improvements on MySQLStreamer at Yelp

											
										
										
											2018-02-06 07:20:54 +06:00
+								* [30x Performance Improvements on MySQLStreamer at Yelp](https://engineeringblog.yelp.com/2018/02/making-30x-performance-improvements-on-yelps-mysqlstreamer.html)
-												Performance Monitoring with Riemann and Clojure at Walmart

											
										
										
											2018-02-07 18:14:52 +06:00
+								* [Performance Monitoring with Riemann and Clojure at Walmart](https://medium.com/walmartlabs/performance-monitoring-with-riemann-and-clojure-eafc07fcd375)
-												Improving Homepage Performance at Zillow

											
										
										
											2018-02-08 15:55:47 +06:00
+								* [Improving Homepage Performance at Zillow](https://www.zillow.com/engineering/improving-homepage-performance/)
-												Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier

											
										
										
											2018-02-08 16:06:15 +06:00
+								* [Decreasing RAM Usage by 40% Using jemalloc with Python & Celery at Zapier](https://zapier.com/engineering/celery-python-jemalloc/)
-												Using Java Large Heap (110 GB) for Boosting Site Perpormance at Expedia

											
										
										
											2018-03-01 06:58:34 +06:00
+								* [Using Java Large Heap (110 GB) for Boosting Site Perpormance at Expedia](https://techblog.expedia.com/2015/09/25/solving-problems-with-very-large-java-heaps/)
-												Add a section for Performance

											
										
										
											2018-01-26 18:05:29 +06:00
-												Scalable Gaming Patterns on AWS (Sep 2017)

											
										
										
											2018-01-05 17:40:04 +06:00
+								## Others
-												Architecture of SurveyMonkey

											
										
										
											2018-02-11 08:32:12 +06:00
+								* [Architecture of Tripod (Flickr’s Backend)](https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored)
 								* [Architecture of SurveyMonkey](https://engineering.surveymonkey.com/2016/04/09/the-architecture-behind-surveymonkey/)
-												Architecture of Data Platform at Flipkart

											
										
										
											2018-02-11 08:34:48 +06:00
+								* [Architecture of Data Platform at Flipkart](https://tech.flipkart.com/overview-of-flipkart-data-platform-20c6d3e9a196)
-												Architecture of Stack Overflow Enterprise at Palantir

											
										
										
											2018-02-18 06:03:19 +06:00
+								* [Architecture of Stack Overflow Enterprise at Palantir](https://medium.com/@palantir/terraforming-stack-overflow-enterprise-in-aws-47ee431e6be7)
-												Refactor

											
										
										
											2018-02-18 06:08:55 +06:00
+								* [Architecture of Distributed Cron at Quora](https://engineering.quora.com/Quoras-Distributed-Cron-Architecture)
-												Simone: Distributed Simulation Service at Netflix

											
										
										
											2018-02-05 13:27:39 +06:00
+								* [Simone: Distributed Simulation Service at Netflix](https://medium.com/netflix-techblog/https-medium-com-netflix-techblog-simone-a-distributed-simulation-service-b2c85131ca1b)
-												Syscall Auditing at Scale at Slack

											
										
										
											2018-01-29 13:48:08 +06:00
+								* [Seagull: Distributed System that Helps Running > 20 Million Tests Per Day at Yelp](https://engineeringblog.yelp.com/2017/04/how-yelp-runs-millions-of-tests-every-day.html)
-												Cloud Bouncer: Distributed Rate Limiting at Yahoo

											
										
										
											2018-01-18 14:05:07 +06:00
+								* [Cloud Bouncer: Distributed Rate Limiting at Yahoo](https://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo)
-												Autoscaling Based on Request Queuing at Square

											
										
										
											2018-01-29 17:16:16 +06:00
+								* [Selecting a Cloud Provider at Etsy](https://codeascraft.com/2018/01/04/selecting-a-cloud-provider/)
-												Basic Infrastructure Patterns at Zenefits

											
										
										
											2018-01-30 17:29:35 +06:00
+								* [Basic Infrastructure Patterns at Zenefits](https://engineering.zenefits.com/2016/02/basic-infrastructure-patterns/)
-												Syscall Auditing at Scale at Slack

											
										
										
											2018-01-29 13:48:08 +06:00
+								* [Syscall Auditing at Scale at Slack](https://slack.engineering/syscall-auditing-at-scale-e6a3ca8ac1b8)
-												Autoscaling Jenkins at Trivago

											
										
										
											2018-02-01 11:39:07 +06:00
+								* [Scaling Online Migrations at Stripe](https://stripe.com/blog/online-migrations)
-												Netflix: What Happens When You Press Play?

											
										
										
											2018-02-06 07:57:34 +06:00
+								* [Netflix: What Happens When You Press Play?](http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html)
-												Service Decomposition at Scale at Intuit QuickBooks

											
										
										
											2018-02-08 15:32:19 +06:00
+								* [Service Decomposition at Scale at Intuit QuickBooks](https://quickbooks-engineering.intuit.com/service-decomposition-at-scale-70405ac2f637)
-												Back-end at BlaBlaCar

											
										
										
											2018-02-09 15:48:46 +06:00
+								* [Back-end at BlaBlaCar](http://blablatech.com/blog/BlaBlaTech-behind-the-scene)
-												Scaling (a NSFW website) to 200 Million Views A Day And Beyond - Erick Pickup, Lead Developer at MindGeek

											
										
										
											2018-02-12 22:59:11 +06:00
+								* [Scalable Gaming Patterns on AWS](https://d0.awsstatic.com/whitepapers/aws-scalable-gaming-patterns.pdf)
-												How League Of Legends Scaled Chat To 70 Million Players

											
										
										
											2018-02-12 22:47:15 +06:00
+								* [How League Of Legends Scaled Chat To 70 Million Players](http://highscalability.com/blog/2014/10/13/how-league-of-legends-scaled-chat-to-70-million-players-it-t.html)
-												Scaling NodeJS at Alibaba

											
										
										
											2018-02-18 06:21:39 +06:00
+								* [Scaling NodeJS at Alibaba](https://www.linux.com/blog/can-nodejs-scale-ask-team-alibaba)
-												Distributed Firewall at Linkedin

											
										
										
											2018-03-05 07:22:12 +06:00
+								* [Distributed Firewall at Linkedin](https://www.youtube.com/watch?v=Kb_dU6t56mo)
-												Scalable Gaming Patterns on AWS (Sep 2017)

											
										
										
											2018-01-05 17:40:04 +06:00
-												Add entries to Content

											
										
										
											2018-01-11 00:13:38 +06:00
+								## Talks
-												Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent

											
										
										
											2018-03-10 15:31:35 +06:00
+								* [Distributed Systems in One Lesson - Tim Berglund, Senior Director of Developer Experience at Confluent](https://www.youtube.com/watch?v=Y6Ev8GIlbxc)
-												Principles of Chaos Engineering

											
										
										
											2018-01-22 10:27:41 +06:00
+								* [Building Real Time Infrastructure at Facebook - Jeff Barber and Shie Erlich, Software Engineer at Facebook](https://www.usenix.org/conference/srecon17americas/program/presentation/erlich)
-												Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google

											
										
										
											2018-01-22 10:41:08 +06:00
+								* [Building Reliable Social Infrastructure for Google - Marc Alvidrez, Senior Manager at Google](https://www.usenix.org/conference/srecon16/program/presentation/alvidrez)
-												Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox

											
										
										
											2018-02-14 15:46:43 +06:00
+								* [Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox](https://www.youtube.com/watch?v=ggizCjUCCqE)
-												How Discord Scaled Elixir to Five Millions Concurrent Users

											
										
										
											2018-01-25 16:08:37 +06:00
+								* [How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform](https://www.youtube.com/watch?v=H4vMcD7zKM0)
-												Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack

											
										
										
											2018-01-25 08:41:51 +06:00
+								* [Netflix Guide to Microservices - Josh Evans, Director of Operations Engineering at Netflix](https://www.youtube.com/watch?v=CZ3wIuvmHeM&t=2837s)
 								* [Achieving Rapid Response Times in Large Online Services - Jeff Dean, Google Senior Fellow](https://www.youtube.com/watch?v=1-3Ahy7Fxsc)
-												Scaling Facebook Live Videos to a Billion Users - Sachin Kulkarni, Director of Engineering at Facebook

											
										
										
											2018-01-31 10:23:23 +06:00
+								* [Architecture to Handle 80K RPS Celebrity Sales at Shopify - Simon Eskildsen, Engineering Lead at Shopify](https://www.youtube.com/watch?v=N8NWDHgWA28)
-												Add the Awesome Lectures and Talks section

											
										
										
											2018-01-10 23:46:14 +06:00
+								* [Lessons of Scale at Facebook - Bobby Johnson, Director of Engineering at Facebook](https://www.youtube.com/watch?v=QCHiNEw73AU)
-												Scaling (a NSFW website) to 200 Million Views A Day And Beyond - Erick Pickup, Lead Developer at MindGeek

											
										
										
											2018-02-12 22:59:11 +06:00
+								* [Performance Optimization for the Greater China Region at Salesforce - Jeff Cheng, Enterprise Architect at Salesforce](https://www.salesforce.com/video/1757880/)
 								* [How GIPHY Delivers a GIF to 300 Millions Users - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY](https://vimeo.com/252367076)
-												Scaling NodeJS at Alibaba

											
										
										
											2018-02-18 06:21:39 +06:00
+								* [High Performance Packet Processing Platform at Alibaba - Haiyong Wang, Senior Director at Alibaba](https://www.youtube.com/watch?v=wzsxJqeVIhY&list=PLMu8-hpCxIVENuAue7bd0eCAglLGY_8AW&index=7)
-												Site Reliability Engineering at Dropbox - Tammy Butow, Site Reliability Engineering Manager at Dropbox

											
										
										
											2018-02-14 15:46:43 +06:00
+								* [Scaling Dropbox - Kevin Modzelewski, Back-end Engineer at Dropbox](https://www.youtube.com/watch?v=PE4gwstWhmc)
-												Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox

											
										
										
											2018-02-14 15:51:40 +06:00
+								* [Scaling Reliability at Dropbox - Sat Kriya Khalsa, SRE at Dropbox](https://www.youtube.com/watch?v=IhGWOaD5BYQ)
-												Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy

											
										
										
											2018-02-14 11:02:31 +06:00
+								* [Scaling Live Videos to a Billion Users at Facebook - Sachin Kulkarni, Director of Engineering at Facebook](https://www.youtube.com/watch?v=IO4teCbHvZw)
 								* [Scaling Infrastructure at Instagram - Lisa Guo, Instagram Engineering](https://www.youtube.com/watch?v=hnpzNAPiC0E)
 								* [Scaling Infrastructure at Twitter - Yao Yue, Staff Software Engineer at Twitter](https://www.youtube.com/watch?v=6OvrFkLSoZ0)
 								* [Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy](https://www.youtube.com/watch?v=LfqyhM1LeIU)
 								* [Scaling Data Infrastructure at Spotify - Matti (Lepistö) Pehrs, Spotify](https://www.youtube.com/watch?v=cdsfRXr9pJU)
-												Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer

											
										
										
											2018-01-11 00:07:04 +06:00
+								* [Scaling Pinterest - Marty Weiner, Pinterest’s founding engineer](https://www.youtube.com/watch?v=jQNCuD_hxdQ&list=RDhnpzNAPiC0E&index=11)
-												Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack

											
										
										
											2018-01-25 08:41:51 +06:00
+								* [Scaling Slack - Bing Wei, Software Engineer (Infrastructure) at Slack](https://www.infoq.com/presentations/slack-scalability)
-												Scaling Infrastructure at Etsy - Bethany Macri, Engineering Manager at Etsy

											
										
										
											2018-02-14 11:02:31 +06:00
+								* [Scaling Backend at Youtube - Sugu Sougoumarane, SDE at Youtube](https://www.youtube.com/watch?v=5yDO-tmIoXY&feature=youtu.be)
 								* [Scaling Backend at Uber - Matt Ranney, Chief Systems Architect at Uber](https://www.youtube.com/watch?v=nuiLcWE8sPA)
-												Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix

											
										
										
											2018-02-14 15:35:48 +06:00
+								* [Scaling Global CDN at Netflix - Dave Temkin, Director of Global Networks at Netflix](https://www.youtube.com/watch?v=tbqcsHg-Q_o)
-												Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook

											
										
										
											2018-02-17 12:21:30 +06:00
+								* [Scaling Load Balancing Infra to Support 1.3 Billion Users at Facebook - Patrick Shuff, Production Engineer at Facebook](https://www.youtube.com/watch?v=bxhYNfFeVF4)
-												Correct the title:
Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek

											
										
										
											2018-02-12 23:08:40 +06:00
+								* [Scaling (a NSFW site) to 200 Million Views A Day And Beyond - Eric Pickup, Lead Platform Developer at MindGeek](https://www.youtube.com/watch?v=RlkCdM_f3p4)
-												Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora

											
										
										
											2018-02-19 08:18:26 +06:00
+								* [Scaling Counting Infrastructure at Quora - Chun-Ho Hung and Nikhil Gar, SEs at Quora](https://www.infoq.com/presentations/quora-analytics)
-												Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft

											
										
										
											2018-02-22 16:11:14 +06:00
+								* [Scaling Git at Microsoft - Saeed Noursalehi, Principal Program Manager at Microsoft](https://www.youtube.com/watch?v=g_MPGU_m01s)
-												Add the Awesome Lectures and Talks section

											
										
										
											2018-01-10 23:46:14 +06:00
-												Moving the Talks section above the Books section

											
										
										
											2018-01-22 10:22:51 +06:00
+								## Books
-												Add two very good online and free books: Google SRE and DistSys (mixu)

											
										
										
											2018-01-27 01:40:30 +06:00
+								* [Google Site Reliability Engineering (Online - Free)](https://landing.google.com/sre/book.html)
 								* [Distributed Systems for Fun and Profit (Online - Free)](http://book.mixu.net/distsys/)
-												Add the book: What Every Developer Should Know About SQL Performance (Online - Free)

											
										
										
											2018-02-27 22:58:49 +06:00
+								* [What Every Developer Should Know About SQL Performance (Online - Free)](https://use-the-index-luke.com/sql/table-of-contents)
-												Edit the section of Books

											
										
										
											2018-01-27 16:33:29 +06:00
+								* [Beyond the Twelve-Factor App - Exploring the DNA of Highly Scalable, Resilient Cloud Applications (Free)](http://www.oreilly.com/webops-perf/free/beyond-the-twelve-factor-app.csp)
 								* [Chaos Engineering - Building Confidence in System Behavior through Experiments (Free)](http://www.oreilly.com/webops-perf/free/chaos-engineering.csp?intcmp=il-webops-free-product-na_new_site_chaos_engineering_text_cta)
-												Moving the Talks section above the Books section

											
										
										
											2018-01-22 10:22:51 +06:00
+								* [The Art of Scalability](http://theartofscalability.com/)
 								* [Designing Data-Intensive Applications](https://dataintensive.net/)
 								* [Web Scalability for Startup Engineers](https://www.goodreads.com/book/show/23615147-web-scalability-for-startup-engineers)
 								* [Scalability Rules: 50 Principles for Scaling Web Sites](http://scalabilityrules.com/)
-												Update README.md
											
										
										
											2017-12-27 09:47:31 +06:00
+								## Special Thanks
-												Distributed tracing at Pinterest with Pintrace

											
										
										
											2018-01-03 08:30:17 +06:00
+								* Jonas Bonér, CTO at Lightbend, for the [original inspiration](https://www.slideshare.net/jboner/scalability-availability-stability-patterns)
-												Add CC0 lisence - Thank you very much. my friends!

											
										
										
											2018-01-24 22:41:50 +06:00
-												Minor fix for heading

											
										
										
											2018-01-24 22:47:00 +06:00
+								## License
-												Add CC0 lisence - Thank you very much. my friends!

											
										
										
											2018-01-24 22:41:50 +06:00
 								[![CC-BY](https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/by.svg)](https://creativecommons.org/licenses/by/4.0/)
-												HAPPY CHINESE NEWYEAR! - ENJOY YOUR (AND MY) VACATION!

											
										
										
											2018-02-15 08:26:05 +06:00
+								Copyright Benny (Quoc-Binh) Nguyen, 2018. Licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).