From 989c9bf6b11cb4ab8234758763b8216ad7d318ff Mon Sep 17 00:00:00 2001 From: John Washam Date: Tue, 22 Nov 2016 12:31:34 -0800 Subject: [PATCH] Updating the list of what you need to know (for entry-level engineers) based on recent info. --- README.md | 382 +++++++++++++++++++++++++++++------------------------- 1 file changed, 205 insertions(+), 177 deletions(-) diff --git a/README.md b/README.md index 76fe858..912d68d 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,9 @@ There are extra items I added at the bottom that may come up in the interview or Steve Yegge's "[Get that job at Google](http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html)" and are reflected sometimes word-for-word in Google's coaching notes. +I've pared down what you need to know from what Yegge says for new software engineers. If you have many years of experience, expect a harder interview. +[Read more here](https://googleyasheck.com/what-you-need-to-know-for-your-google-interview-and-what-you-dont/). + --- ## Table of Contents @@ -46,11 +49,20 @@ sometimes word-for-word in Google's coaching notes. - [Trees - Notes & Background](#trees---notes--background) - [Binary search trees: BSTs](#binary-search-trees-bsts) - [Heap / Priority Queue / Binary Heap](#heap--priority-queue--binary-heap) - - [Tries](#tries) - - [Balanced search trees](#balanced-search-trees) - - [N-ary (K-ary, M-ary) trees](#n-ary-k-ary-m-ary-trees) + - balanced search trees (general concept, not details) + - traversals: preorder, inorder, postorder, BFS, DFS - [Sorting](#sorting) + - selection + - insertion + - heapsort + - quicksort + - merge sort - [Graphs](#graphs) + - directed + - undirected + - adjacency matrix + - adjacency list + - traversals: BFS, DFS - [Even More Knowledge](#even-more-knowledge) - [Recursion](#recursion) - [Dynamic Programming](#dynamic-programming) @@ -58,12 +70,12 @@ sometimes word-for-word in Google's coaching notes. - [NP, NP-Complete and Approximation Algorithms](#np-np-complete-and-approximation-algorithms) - [Caches](#caches) - [Processes and Threads](#processes-and-threads) - - [System Design, Scalability, Data Handling](#system-design-scalability-data-handling) - [Papers](#papers) - [Testing](#testing) - [Scheduling](#scheduling) - [Implement system routines](#implement-system-routines) - [String searching & manipulations](#string-searching--manipulations) +- [System Design, Scalability, Data Handling](#system-design-scalability-data-handling) (if you have 4+ years experience) - [Final Review](#final-review) - [Coding Question Practice](#coding-question-practice) - [Coding exercises/challenges](#coding-exerciseschallenges) @@ -88,7 +100,7 @@ sometimes word-for-word in Google's coaching notes. - [Entropy](#entropy) - [Cryptography](#cryptography) - [Compression](#compression) - - [Networking](#networking) + - [Networking](#networking) (if you have networking experience, expect questions) - [Computer Security](#computer-security) - [Garbage collection](#garbage-collection) - [Parallel Programming](#parallel-programming) @@ -100,6 +112,16 @@ sometimes word-for-word in Google's coaching notes. - [Locality-Sensitive Hashing](#locality-sensitive-hashing) - [van Emde Boas Trees](#van-emde-boas-trees) - [Augmented Data Structures](#augmented-data-structures) + - [Tries](#tries) + - [N-ary (K-ary, M-ary) trees](#n-ary-k-ary-m-ary-trees) + - [Balanced search trees](#balanced-search-trees) + - AVL trees + - Splay trees + - Red/black trees + - 2-3 search trees + - 2-3-4 Trees (aka 2-4 trees) + - N-ary (K-ary, M-ary) trees + - B-Trees - [k-D Trees](#k-d-trees) - [Skip lists](#skip-lists) - [Network Flows](#network-flows) @@ -645,113 +667,6 @@ Write code on a whiteboard or paper, not a computer. Test with some sample input - [ ] heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap - note: using a min heap instead would save operations, but double the space needed (cannot do in-place). -- ### Tries - - Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits - to track the path. - - I read through code, but will not implement. - - [ ] [Sedgewick - Tries (3 videos)](https://www.youtube.com/playlist?list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) - - [ ] [1. R Way Tries](https://www.youtube.com/watch?v=buq2bn8x3Vo&index=3&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) - - [ ] [2. Ternary Search Tries](https://www.youtube.com/watch?v=LelV-kkYMIg&index=2&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) - - [ ] [3. Character Based Operations](https://www.youtube.com/watch?v=00YaFPcC65g&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ&index=1) - - [ ] [Notes on Data Structures and Programming Techniques](http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html#Tries) - - [ ] Short course videos: - - [ ] [Introduction To Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/08Xyf/core-introduction-to-tries) - - [ ] [Performance Of Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/PvlZW/core-performance-of-tries) - - [ ] [Implementing A Trie (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/DFvd3/core-implementing-a-trie) - - [ ] [The Trie: A Neglected Data Structure](https://www.toptal.com/java/the-trie-a-neglected-data-structure) - - [ ] [TopCoder - Using Tries](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/) - - [ ] [Stanford Lecture (real world use case) (video)](https://www.youtube.com/watch?v=TJ8SkcUSdbU) - - [ ] [MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)](https://www.youtube.com/watch?v=NinWEPPrkDQ&index=16&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf) - -- ### Balanced search trees - - Know least one type of balanced binary tree (and know how it's implemented): - - "Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular. - A particularly interesting self-organizing data structure is the splay tree, which uses rotations - to move any accessed key to the root." - Skiena - - Of these, I chose to implement a splay tree. From what I've read, you won't implement a - balanced search tree in your interview. But I wanted exposure to coding one up - and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code. - - splay tree: insert, search, delete functions - If you end up implementing red/black tree try just these: - - search and insertion functions, skipping delete - - I want to learn more about B-Tree since it's used so widely with very large data sets. - - [ ] [Self-balancing binary search tree](https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree) - - - [ ] **AVL trees** - - In practice: - From what I can tell, these aren't used much in practice, but I could see where they would be: - The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly - balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it - attractive for data structures that may be built once and loaded without reconstruction, such as language - dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter). - - [ ] [MIT AVL Trees / AVL Sort (video)](https://www.youtube.com/watch?v=FNeL18KsWPc&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=6) - - [ ] [AVL Trees (video)](https://www.coursera.org/learn/data-structures/lecture/Qq5E0/avl-trees) - - [ ] [AVL Tree Implementation (video)](https://www.coursera.org/learn/data-structures/lecture/PKEBC/avl-tree-implementation) - - [ ] [Split And Merge](https://www.coursera.org/learn/data-structures/lecture/22BgE/split-and-merge) - - - [ ] **Splay trees** - - In practice: - Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, - data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory, - networking, and file system code) etc. - - [ ] [CS 61B: Splay Trees (video)](https://www.youtube.com/watch?v=Najzh1rYQTo&index=23&list=PL-XXv-cvA_iAlnI-BQr9hjqADPBtujFJd) - - [ ] MIT Lecture: Splay Trees: - - Gets very mathy, but watch the last 10 minutes for sure. - - [Video](https://www.youtube.com/watch?v=QnPl_Y6EqMo) - - - [ ] **2-3 search trees** - - In practice: - 2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees). - - You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees. - - [ ] [23-Tree Intuition and Definition (video)](https://www.youtube.com/watch?v=C3SsdUqasD4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=2) - - [ ] [Binary View of 23-Tree](https://www.youtube.com/watch?v=iYvBtGKsqSg&index=3&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) - - [ ] [2-3 Trees (student recitation) (video)](https://www.youtube.com/watch?v=TOb1tuEZ2X4&index=5&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp) - - - [ ] **2-3-4 Trees (aka 2-4 trees)** - - In practice: - For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion - operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an - important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce - 2-4 trees just before red–black trees, even though **2-4 trees are not often used in practice**. - - [ ] [CS 61B Lecture 26: Balanced Search Trees (video)](https://www.youtube.com/watch?v=zqrqYXkth6Q&index=26&list=PL4BBB74C7D2A1049C) - - [ ] [Bottom Up 234-Trees (video)](https://www.youtube.com/watch?v=DQdMYevEyE4&index=4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) - - [ ] [Top Down 234-Trees (video)](https://www.youtube.com/watch?v=2679VQ26Fp4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=5) - - - [ ] **B-Trees** - - fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor) - - In Practice: - B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to - its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary - block in a particular file. The basic problem is turning the file block i address into a disk block - (or perhaps to a cylinder-head-sector) address. - - [ ] [B-Tree](https://en.wikipedia.org/wiki/B-tree) - - [ ] [Introduction to B-Trees (video)](https://www.youtube.com/watch?v=I22wEC1tTGo&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=6) - - [ ] [B-Tree Definition and Insertion (video)](https://www.youtube.com/watch?v=s3bCdZGrgpA&index=7&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) - - [ ] [B-Tree Deletion (video)](https://www.youtube.com/watch?v=svfnVhJOfMc&index=8&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) - - [ ] [MIT 6.851 - Memory Hierarchy Models (video)](https://www.youtube.com/watch?v=V3omVLzI0WE&index=7&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf) - - covers cache-oblivious B-Trees, very interesting data structures - - the first 37 minutes are very technical, may be skipped (B is block size, cache line size) - - - [ ] **Red/black trees** - - In practice: - Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. - Not only does this make them valuable in time-sensitive applications such as real-time applications, - but it makes them valuable building blocks in other data structures which provide worst-case guarantees; - for example, many data structures used in computational geometry can be based on red–black trees, and - the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java, - the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor - hashcodes, a Red-Black tree is used. - - [ ] [Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)](https://youtu.be/1W3x0f_RmUo?list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&t=3871) - - [ ] [Aduni - Algorithms - Lecture 5 (video)](https://www.youtube.com/watch?v=hm2GHwyKF1o&list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&index=5) - - [ ] [Black Tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) - - [ ] [An Introduction To Binary Search And Red Black Tree](https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/) - -- ### N-ary (K-ary, M-ary) trees - - note: the N or K is the branching factor (max branches) - - binary trees are a 2-ary tree, with branching factor = 2 - - 2-3 trees are 3-ary - - [ ] [K-Ary Tree](https://en.wikipedia.org/wiki/K-ary_tree) - ## Sorting - [ ] Notes: @@ -1002,9 +917,78 @@ You'll get more graph practice in Skiena's book (see Books section below) and th - [ ] [Keynote David Beazley - Topics of Interest (Python Asyncio)](https://www.youtube.com/watch?v=ZzfHjytDceU) - [ ] [Mutex in Python](https://www.youtube.com/watch?v=0zaPs8OtyKY) +- ### Papers + - These are Google papers and well-known papers. + - Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections. + - [ ] [1978: Communicating Sequential Processes](http://spinroot.com/courses/summer/Papers/hoare_1978.pdf) + - [implemented in Go](https://godoc.org/github.com/thomas11/csp) + - [Love classic papers?](https://www.cs.cmu.edu/~crary/819-f09/) + - [ ] [2003: The Google File System](http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf) + - replaced by Colossus in 2012 + - [ ] [2004: MapReduce: Simplified Data Processing on Large Clusters]( http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) + - mostly replaced by Cloud Dataflow? + - [ ] [2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)](https://www.akkadia.org/drepper/cpumemory.pdf) + - [ ] [2012: Google's Colossus](https://www.wired.com/2012/07/google-colossus/) + - paper not available + - [ ] 2012: AddressSanitizer: A Fast Address Sanity Checker: + - [paper](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf) + - [video](https://www.usenix.org/conference/atc12/technical-sessions/presentation/serebryany) + - [ ] 2013: Spanner: Google’s Globally-Distributed Database: + - [paper](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf) + - [video](https://www.usenix.org/node/170855) + - [ ] [2014: Machine Learning: The High-Interest Credit Card of Technical Debt](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf) + - [ ] [2015: Continuous Pipelines at Google](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43790.pdf) + - [ ] [2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44686.pdf) + - [ ] [2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf ) + - [ ] [2015: How Developers Search for Code: A Case Study](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43835.pdf) + - [ ] [2016: Borg, Omega, and Kubernetes](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf) + +- ### Testing + - To cover: + - how unit testing works + - what are mock objects + - what is integration testing + - what is dependency injection + - [ ] [Agile Software Testing with James Bach (video)](https://www.youtube.com/watch?v=SAhJf36_u5U) + - [ ] [Open Lecture by James Bach on Software Testing (video)](https://www.youtube.com/watch?v=ILkT_HV9DVU) + - [ ] [Steve Freeman - Test-Driven Development (that’s not what we meant) (video)](https://vimeo.com/83960706) + - [slides](http://gotocon.com/dl/goto-berlin-2013/slides/SteveFreeman_TestDrivenDevelopmentThatsNotWhatWeMeant.pdf) + - [ ] [TDD is dead. Long live testing.](http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html) + - [ ] [Is TDD dead? (video)](https://www.youtube.com/watch?v=z9quxZsLcfo) + - [ ] [Video series (152 videos) - not all are needed (video)](https://www.youtube.com/watch?v=nzJapzxH_rE&list=PLAwxTw4SYaPkWVHeC_8aSIbSxE_NXI76g) + - [ ] [Test-Driven Web Development with Python](http://www.obeythetestinggoat.com/pages/book.html#toc) + - [ ] Dependency injection: + - [ ] [video](https://www.youtube.com/watch?v=IKD2-MAkXyQ) + - [ ] [Tao Of Testing](http://jasonpolites.github.io/tao-of-testing/ch3-1.1.html) + - [ ] [How to write tests](http://jasonpolites.github.io/tao-of-testing/ch4-1.1.html) + +- ### Scheduling + - in an OS, how it works + - can be gleaned from Operating System videos + +- ### Implement system routines + - understand what lies beneath the programming APIs you use + - can you implement them? + +- ### String searching & manipulations + - [ ] [Sedgewick - Suffix Arrays (video)](https://www.youtube.com/watch?v=HKPrVm5FWvg) + - [ ] [Sedgewick - Substring Search (videos)](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5) + - [ ] [1. Introduction to Substring Search](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5) + - [ ] [2. Brute-Force Substring Search](https://www.youtube.com/watch?v=CcDXwIGEXYU&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=4) + - [ ] [3. Knuth-Morris Pratt](https://www.youtube.com/watch?v=n-7n-FDEWzc&index=3&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66) + - [ ] [4. Boyer-Moore](https://www.youtube.com/watch?v=fI7Ch6pZXfM&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=2) + - [ ] [5. Rabin-Karp](https://www.youtube.com/watch?v=QzI0p6zDjK4&index=1&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66) + - [ ] [Search pattern in text (video)](https://www.coursera.org/learn/data-structures/lecture/tAfHI/search-pattern-in-text) + + If you need more detail on this subject, see "String Matching" section in [Additional Detail on Some Subjects](#additional-detail-on-some-subjects) + +--- Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this. + + You can expect system design questions if you have 4+ years of experience + - ### System Design, Scalability, Data Handling - Considerations from Yegge: @@ -1147,71 +1131,6 @@ You'll get more graph practice in Skiena's book (see Books section below) and th - [Design a URL-shortener system: copied from above](http://www.hiredintech.com/system-design/the-system-design-process/) - [Design a cache system](https://www.adayinthelifeof.nl/2011/02/06/memcache-internals/) -- ### Papers - - These are Google papers and well-known papers. - - Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections. - - [ ] [1978: Communicating Sequential Processes](http://spinroot.com/courses/summer/Papers/hoare_1978.pdf) - - [implemented in Go](https://godoc.org/github.com/thomas11/csp) - - [Love classic papers?](https://www.cs.cmu.edu/~crary/819-f09/) - - [ ] [2003: The Google File System](http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf) - - replaced by Colossus in 2012 - - [ ] [2004: MapReduce: Simplified Data Processing on Large Clusters]( http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) - - mostly replaced by Cloud Dataflow? - - [ ] [2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)](https://www.akkadia.org/drepper/cpumemory.pdf) - - [ ] [2012: Google's Colossus](https://www.wired.com/2012/07/google-colossus/) - - paper not available - - [ ] 2012: AddressSanitizer: A Fast Address Sanity Checker: - - [paper](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf) - - [video](https://www.usenix.org/conference/atc12/technical-sessions/presentation/serebryany) - - [ ] 2013: Spanner: Google’s Globally-Distributed Database: - - [paper](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf) - - [video](https://www.usenix.org/node/170855) - - [ ] [2014: Machine Learning: The High-Interest Credit Card of Technical Debt](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf) - - [ ] [2015: Continuous Pipelines at Google](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43790.pdf) - - [ ] [2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44686.pdf) - - [ ] [2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf ) - - [ ] [2015: How Developers Search for Code: A Case Study](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43835.pdf) - - [ ] [2016: Borg, Omega, and Kubernetes](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf) - -- ### Testing - - To cover: - - how unit testing works - - what are mock objects - - what is integration testing - - what is dependency injection - - [ ] [Agile Software Testing with James Bach (video)](https://www.youtube.com/watch?v=SAhJf36_u5U) - - [ ] [Open Lecture by James Bach on Software Testing (video)](https://www.youtube.com/watch?v=ILkT_HV9DVU) - - [ ] [Steve Freeman - Test-Driven Development (that’s not what we meant) (video)](https://vimeo.com/83960706) - - [slides](http://gotocon.com/dl/goto-berlin-2013/slides/SteveFreeman_TestDrivenDevelopmentThatsNotWhatWeMeant.pdf) - - [ ] [TDD is dead. Long live testing.](http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html) - - [ ] [Is TDD dead? (video)](https://www.youtube.com/watch?v=z9quxZsLcfo) - - [ ] [Video series (152 videos) - not all are needed (video)](https://www.youtube.com/watch?v=nzJapzxH_rE&list=PLAwxTw4SYaPkWVHeC_8aSIbSxE_NXI76g) - - [ ] [Test-Driven Web Development with Python](http://www.obeythetestinggoat.com/pages/book.html#toc) - - [ ] Dependency injection: - - [ ] [video](https://www.youtube.com/watch?v=IKD2-MAkXyQ) - - [ ] [Tao Of Testing](http://jasonpolites.github.io/tao-of-testing/ch3-1.1.html) - - [ ] [How to write tests](http://jasonpolites.github.io/tao-of-testing/ch4-1.1.html) - -- ### Scheduling - - in an OS, how it works - - can be gleaned from Operating System videos - -- ### Implement system routines - - understand what lies beneath the programming APIs you use - - can you implement them? - -- ### String searching & manipulations - - [ ] [Sedgewick - Suffix Arrays (video)](https://www.youtube.com/watch?v=HKPrVm5FWvg) - - [ ] [Sedgewick - Substring Search (videos)](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5) - - [ ] [1. Introduction to Substring Search](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5) - - [ ] [2. Brute-Force Substring Search](https://www.youtube.com/watch?v=CcDXwIGEXYU&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=4) - - [ ] [3. Knuth-Morris Pratt](https://www.youtube.com/watch?v=n-7n-FDEWzc&index=3&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66) - - [ ] [4. Boyer-Moore](https://www.youtube.com/watch?v=fI7Ch6pZXfM&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=2) - - [ ] [5. Rabin-Karp](https://www.youtube.com/watch?v=QzI0p6zDjK4&index=1&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66) - - [ ] [Search pattern in text (video)](https://www.coursera.org/learn/data-structures/lecture/tAfHI/search-pattern-in-text) - - If you need more detail on this subject, see "String Matching" section in [Additional Detail on Some Subjects](#additional-detail-on-some-subjects) - --- ## Final Review @@ -1678,6 +1597,115 @@ You're never really done. - ### Augmented Data Structures - [ ] [CS 61B Lecture 39: Augmenting Data Structures](https://youtu.be/zksIj9O8_jc?list=PL4BBB74C7D2A1049C&t=950) +- ### Tries + - Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits + to track the path. + - I read through code, but will not implement. + - [ ] [Sedgewick - Tries (3 videos)](https://www.youtube.com/playlist?list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) + - [ ] [1. R Way Tries](https://www.youtube.com/watch?v=buq2bn8x3Vo&index=3&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) + - [ ] [2. Ternary Search Tries](https://www.youtube.com/watch?v=LelV-kkYMIg&index=2&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) + - [ ] [3. Character Based Operations](https://www.youtube.com/watch?v=00YaFPcC65g&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ&index=1) + - [ ] [Notes on Data Structures and Programming Techniques](http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html#Tries) + - [ ] Short course videos: + - [ ] [Introduction To Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/08Xyf/core-introduction-to-tries) + - [ ] [Performance Of Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/PvlZW/core-performance-of-tries) + - [ ] [Implementing A Trie (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/DFvd3/core-implementing-a-trie) + - [ ] [The Trie: A Neglected Data Structure](https://www.toptal.com/java/the-trie-a-neglected-data-structure) + - [ ] [TopCoder - Using Tries](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/) + - [ ] [Stanford Lecture (real world use case) (video)](https://www.youtube.com/watch?v=TJ8SkcUSdbU) + - [ ] [MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)](https://www.youtube.com/watch?v=NinWEPPrkDQ&index=16&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf) + +- ### Balanced search trees + - Know least one type of balanced binary tree (and know how it's implemented): + - "Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular. + A particularly interesting self-organizing data structure is the splay tree, which uses rotations + to move any accessed key to the root." - Skiena + - Of these, I chose to implement a splay tree. From what I've read, you won't implement a + balanced search tree in your interview. But I wanted exposure to coding one up + and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code. + - splay tree: insert, search, delete functions + If you end up implementing red/black tree try just these: + - search and insertion functions, skipping delete + - I want to learn more about B-Tree since it's used so widely with very large data sets. + - [ ] [Self-balancing binary search tree](https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree) + + - [ ] **AVL trees** + - In practice: + From what I can tell, these aren't used much in practice, but I could see where they would be: + The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly + balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it + attractive for data structures that may be built once and loaded without reconstruction, such as language + dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter). + - [ ] [MIT AVL Trees / AVL Sort (video)](https://www.youtube.com/watch?v=FNeL18KsWPc&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=6) + - [ ] [AVL Trees (video)](https://www.coursera.org/learn/data-structures/lecture/Qq5E0/avl-trees) + - [ ] [AVL Tree Implementation (video)](https://www.coursera.org/learn/data-structures/lecture/PKEBC/avl-tree-implementation) + - [ ] [Split And Merge](https://www.coursera.org/learn/data-structures/lecture/22BgE/split-and-merge) + + - [ ] **Splay trees** + - In practice: + Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, + data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory, + networking, and file system code) etc. + - [ ] [CS 61B: Splay Trees (video)](https://www.youtube.com/watch?v=Najzh1rYQTo&index=23&list=PL-XXv-cvA_iAlnI-BQr9hjqADPBtujFJd) + - [ ] MIT Lecture: Splay Trees: + - Gets very mathy, but watch the last 10 minutes for sure. + - [Video](https://www.youtube.com/watch?v=QnPl_Y6EqMo) + + - [ ] **Red/black trees** + - these are a translation of a 2-3 tree (see below) + - In practice: + Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. + Not only does this make them valuable in time-sensitive applications such as real-time applications, + but it makes them valuable building blocks in other data structures which provide worst-case guarantees; + for example, many data structures used in computational geometry can be based on red–black trees, and + the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java, + the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor + hashcodes, a Red-Black tree is used. + - [ ] [Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)](https://youtu.be/1W3x0f_RmUo?list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&t=3871) + - [ ] [Aduni - Algorithms - Lecture 5 (video)](https://www.youtube.com/watch?v=hm2GHwyKF1o&list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&index=5) + - [ ] [Black Tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) + - [ ] [An Introduction To Binary Search And Red Black Tree](https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/) + + - [ ] **2-3 search trees** + - In practice: + 2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees). + - You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees. + - [ ] [23-Tree Intuition and Definition (video)](https://www.youtube.com/watch?v=C3SsdUqasD4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=2) + - [ ] [Binary View of 23-Tree](https://www.youtube.com/watch?v=iYvBtGKsqSg&index=3&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) + - [ ] [2-3 Trees (student recitation) (video)](https://www.youtube.com/watch?v=TOb1tuEZ2X4&index=5&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp) + + - [ ] **2-3-4 Trees (aka 2-4 trees)** + - In practice: + For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion + operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an + important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce + 2-4 trees just before red–black trees, even though **2-4 trees are not often used in practice**. + - [ ] [CS 61B Lecture 26: Balanced Search Trees (video)](https://www.youtube.com/watch?v=zqrqYXkth6Q&index=26&list=PL4BBB74C7D2A1049C) + - [ ] [Bottom Up 234-Trees (video)](https://www.youtube.com/watch?v=DQdMYevEyE4&index=4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) + - [ ] [Top Down 234-Trees (video)](https://www.youtube.com/watch?v=2679VQ26Fp4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=5) + + - [ ] **N-ary (K-ary, M-ary) trees** + - note: the N or K is the branching factor (max branches) + - binary trees are a 2-ary tree, with branching factor = 2 + - 2-3 trees are 3-ary + - [ ] [K-Ary Tree](https://en.wikipedia.org/wiki/K-ary_tree) + + - [ ] **B-Trees** + - fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor) + - In Practice: + B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to + its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary + block in a particular file. The basic problem is turning the file block i address into a disk block + (or perhaps to a cylinder-head-sector) address. + - [ ] [B-Tree](https://en.wikipedia.org/wiki/B-tree) + - [ ] [Introduction to B-Trees (video)](https://www.youtube.com/watch?v=I22wEC1tTGo&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=6) + - [ ] [B-Tree Definition and Insertion (video)](https://www.youtube.com/watch?v=s3bCdZGrgpA&index=7&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) + - [ ] [B-Tree Deletion (video)](https://www.youtube.com/watch?v=svfnVhJOfMc&index=8&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6) + - [ ] [MIT 6.851 - Memory Hierarchy Models (video)](https://www.youtube.com/watch?v=V3omVLzI0WE&index=7&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf) + - covers cache-oblivious B-Trees, very interesting data structures + - the first 37 minutes are very technical, may be skipped (B is block size, cache line size) + + - ### k-D Trees - great for finding number of points in a rectangle or higher dimension object - a good fit for k-nearest neighbors