Updating the list of what you need to know (for entry-level engineers) based on recent info.

This commit is contained in:
John Washam 2016-11-22 12:31:34 -08:00
parent 8ce532760d
commit 989c9bf6b1
1 changed files with 205 additions and 177 deletions

382
README.md
View File

@ -13,6 +13,9 @@ There are extra items I added at the bottom that may come up in the interview or
Steve Yegge's "[Get that job at Google](http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html)" and are reflected
sometimes word-for-word in Google's coaching notes.
I've pared down what you need to know from what Yegge says for new software engineers. If you have many years of experience, expect a harder interview.
[Read more here](https://googleyasheck.com/what-you-need-to-know-for-your-google-interview-and-what-you-dont/).
---
## Table of Contents
@ -46,11 +49,20 @@ sometimes word-for-word in Google's coaching notes.
- [Trees - Notes & Background](#trees---notes--background)
- [Binary search trees: BSTs](#binary-search-trees-bsts)
- [Heap / Priority Queue / Binary Heap](#heap--priority-queue--binary-heap)
- [Tries](#tries)
- [Balanced search trees](#balanced-search-trees)
- [N-ary (K-ary, M-ary) trees](#n-ary-k-ary-m-ary-trees)
- balanced search trees (general concept, not details)
- traversals: preorder, inorder, postorder, BFS, DFS
- [Sorting](#sorting)
- selection
- insertion
- heapsort
- quicksort
- merge sort
- [Graphs](#graphs)
- directed
- undirected
- adjacency matrix
- adjacency list
- traversals: BFS, DFS
- [Even More Knowledge](#even-more-knowledge)
- [Recursion](#recursion)
- [Dynamic Programming](#dynamic-programming)
@ -58,12 +70,12 @@ sometimes word-for-word in Google's coaching notes.
- [NP, NP-Complete and Approximation Algorithms](#np-np-complete-and-approximation-algorithms)
- [Caches](#caches)
- [Processes and Threads](#processes-and-threads)
- [System Design, Scalability, Data Handling](#system-design-scalability-data-handling)
- [Papers](#papers)
- [Testing](#testing)
- [Scheduling](#scheduling)
- [Implement system routines](#implement-system-routines)
- [String searching & manipulations](#string-searching--manipulations)
- [System Design, Scalability, Data Handling](#system-design-scalability-data-handling) (if you have 4+ years experience)
- [Final Review](#final-review)
- [Coding Question Practice](#coding-question-practice)
- [Coding exercises/challenges](#coding-exerciseschallenges)
@ -88,7 +100,7 @@ sometimes word-for-word in Google's coaching notes.
- [Entropy](#entropy)
- [Cryptography](#cryptography)
- [Compression](#compression)
- [Networking](#networking)
- [Networking](#networking) (if you have networking experience, expect questions)
- [Computer Security](#computer-security)
- [Garbage collection](#garbage-collection)
- [Parallel Programming](#parallel-programming)
@ -100,6 +112,16 @@ sometimes word-for-word in Google's coaching notes.
- [Locality-Sensitive Hashing](#locality-sensitive-hashing)
- [van Emde Boas Trees](#van-emde-boas-trees)
- [Augmented Data Structures](#augmented-data-structures)
- [Tries](#tries)
- [N-ary (K-ary, M-ary) trees](#n-ary-k-ary-m-ary-trees)
- [Balanced search trees](#balanced-search-trees)
- AVL trees
- Splay trees
- Red/black trees
- 2-3 search trees
- 2-3-4 Trees (aka 2-4 trees)
- N-ary (K-ary, M-ary) trees
- B-Trees
- [k-D Trees](#k-d-trees)
- [Skip lists](#skip-lists)
- [Network Flows](#network-flows)
@ -645,113 +667,6 @@ Write code on a whiteboard or paper, not a computer. Test with some sample input
- [ ] heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap
- note: using a min heap instead would save operations, but double the space needed (cannot do in-place).
- ### Tries
- Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits
to track the path.
- I read through code, but will not implement.
- [ ] [Sedgewick - Tries (3 videos)](https://www.youtube.com/playlist?list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ)
- [ ] [1. R Way Tries](https://www.youtube.com/watch?v=buq2bn8x3Vo&index=3&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ)
- [ ] [2. Ternary Search Tries](https://www.youtube.com/watch?v=LelV-kkYMIg&index=2&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ)
- [ ] [3. Character Based Operations](https://www.youtube.com/watch?v=00YaFPcC65g&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ&index=1)
- [ ] [Notes on Data Structures and Programming Techniques](http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html#Tries)
- [ ] Short course videos:
- [ ] [Introduction To Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/08Xyf/core-introduction-to-tries)
- [ ] [Performance Of Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/PvlZW/core-performance-of-tries)
- [ ] [Implementing A Trie (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/DFvd3/core-implementing-a-trie)
- [ ] [The Trie: A Neglected Data Structure](https://www.toptal.com/java/the-trie-a-neglected-data-structure)
- [ ] [TopCoder - Using Tries](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/)
- [ ] [Stanford Lecture (real world use case) (video)](https://www.youtube.com/watch?v=TJ8SkcUSdbU)
- [ ] [MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)](https://www.youtube.com/watch?v=NinWEPPrkDQ&index=16&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf)
- ### Balanced search trees
- Know least one type of balanced binary tree (and know how it's implemented):
- "Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular.
A particularly interesting self-organizing data structure is the splay tree, which uses rotations
to move any accessed key to the root." - Skiena
- Of these, I chose to implement a splay tree. From what I've read, you won't implement a
balanced search tree in your interview. But I wanted exposure to coding one up
and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code.
- splay tree: insert, search, delete functions
If you end up implementing red/black tree try just these:
- search and insertion functions, skipping delete
- I want to learn more about B-Tree since it's used so widely with very large data sets.
- [ ] [Self-balancing binary search tree](https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree)
- [ ] **AVL trees**
- In practice:
From what I can tell, these aren't used much in practice, but I could see where they would be:
The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly
balanced than redblack trees, leading to slower insertion and removal but faster retrieval. This makes it
attractive for data structures that may be built once and loaded without reconstruction, such as language
dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter).
- [ ] [MIT AVL Trees / AVL Sort (video)](https://www.youtube.com/watch?v=FNeL18KsWPc&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=6)
- [ ] [AVL Trees (video)](https://www.coursera.org/learn/data-structures/lecture/Qq5E0/avl-trees)
- [ ] [AVL Tree Implementation (video)](https://www.coursera.org/learn/data-structures/lecture/PKEBC/avl-tree-implementation)
- [ ] [Split And Merge](https://www.coursera.org/learn/data-structures/lecture/22BgE/split-and-merge)
- [ ] **Splay trees**
- In practice:
Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors,
data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory,
networking, and file system code) etc.
- [ ] [CS 61B: Splay Trees (video)](https://www.youtube.com/watch?v=Najzh1rYQTo&index=23&list=PL-XXv-cvA_iAlnI-BQr9hjqADPBtujFJd)
- [ ] MIT Lecture: Splay Trees:
- Gets very mathy, but watch the last 10 minutes for sure.
- [Video](https://www.youtube.com/watch?v=QnPl_Y6EqMo)
- [ ] **2-3 search trees**
- In practice:
2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees).
- You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees.
- [ ] [23-Tree Intuition and Definition (video)](https://www.youtube.com/watch?v=C3SsdUqasD4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=2)
- [ ] [Binary View of 23-Tree](https://www.youtube.com/watch?v=iYvBtGKsqSg&index=3&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [2-3 Trees (student recitation) (video)](https://www.youtube.com/watch?v=TOb1tuEZ2X4&index=5&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp)
- [ ] **2-3-4 Trees (aka 2-4 trees)**
- In practice:
For every 2-4 tree, there are corresponding redblack trees with data elements in the same order. The insertion and deletion
operations on 2-4 trees are also equivalent to color-flipping and rotations in redblack trees. This makes 2-4 trees an
important tool for understanding the logic behind redblack trees, and this is why many introductory algorithm texts introduce
2-4 trees just before redblack trees, even though **2-4 trees are not often used in practice**.
- [ ] [CS 61B Lecture 26: Balanced Search Trees (video)](https://www.youtube.com/watch?v=zqrqYXkth6Q&index=26&list=PL4BBB74C7D2A1049C)
- [ ] [Bottom Up 234-Trees (video)](https://www.youtube.com/watch?v=DQdMYevEyE4&index=4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [Top Down 234-Trees (video)](https://www.youtube.com/watch?v=2679VQ26Fp4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=5)
- [ ] **B-Trees**
- fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor)
- In Practice:
B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to
its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary
block in a particular file. The basic problem is turning the file block i address into a disk block
(or perhaps to a cylinder-head-sector) address.
- [ ] [B-Tree](https://en.wikipedia.org/wiki/B-tree)
- [ ] [Introduction to B-Trees (video)](https://www.youtube.com/watch?v=I22wEC1tTGo&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=6)
- [ ] [B-Tree Definition and Insertion (video)](https://www.youtube.com/watch?v=s3bCdZGrgpA&index=7&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [B-Tree Deletion (video)](https://www.youtube.com/watch?v=svfnVhJOfMc&index=8&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [MIT 6.851 - Memory Hierarchy Models (video)](https://www.youtube.com/watch?v=V3omVLzI0WE&index=7&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf)
- covers cache-oblivious B-Trees, very interesting data structures
- the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
- [ ] **Red/black trees**
- In practice:
Redblack trees offer worst-case guarantees for insertion time, deletion time, and search time.
Not only does this make them valuable in time-sensitive applications such as real-time applications,
but it makes them valuable building blocks in other data structures which provide worst-case guarantees;
for example, many data structures used in computational geometry can be based on redblack trees, and
the Completely Fair Scheduler used in current Linux kernels uses redblack trees. In the version 8 of Java,
the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor
hashcodes, a Red-Black tree is used.
- [ ] [Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)](https://youtu.be/1W3x0f_RmUo?list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&t=3871)
- [ ] [Aduni - Algorithms - Lecture 5 (video)](https://www.youtube.com/watch?v=hm2GHwyKF1o&list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&index=5)
- [ ] [Black Tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)
- [ ] [An Introduction To Binary Search And Red Black Tree](https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/)
- ### N-ary (K-ary, M-ary) trees
- note: the N or K is the branching factor (max branches)
- binary trees are a 2-ary tree, with branching factor = 2
- 2-3 trees are 3-ary
- [ ] [K-Ary Tree](https://en.wikipedia.org/wiki/K-ary_tree)
## Sorting
- [ ] Notes:
@ -1002,9 +917,78 @@ You'll get more graph practice in Skiena's book (see Books section below) and th
- [ ] [Keynote David Beazley - Topics of Interest (Python Asyncio)](https://www.youtube.com/watch?v=ZzfHjytDceU)
- [ ] [Mutex in Python](https://www.youtube.com/watch?v=0zaPs8OtyKY)
- ### Papers
- These are Google papers and well-known papers.
- Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections.
- [ ] [1978: Communicating Sequential Processes](http://spinroot.com/courses/summer/Papers/hoare_1978.pdf)
- [implemented in Go](https://godoc.org/github.com/thomas11/csp)
- [Love classic papers?](https://www.cs.cmu.edu/~crary/819-f09/)
- [ ] [2003: The Google File System](http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)
- replaced by Colossus in 2012
- [ ] [2004: MapReduce: Simplified Data Processing on Large Clusters]( http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)
- mostly replaced by Cloud Dataflow?
- [ ] [2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)](https://www.akkadia.org/drepper/cpumemory.pdf)
- [ ] [2012: Google's Colossus](https://www.wired.com/2012/07/google-colossus/)
- paper not available
- [ ] 2012: AddressSanitizer: A Fast Address Sanity Checker:
- [paper](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf)
- [video](https://www.usenix.org/conference/atc12/technical-sessions/presentation/serebryany)
- [ ] 2013: Spanner: Googles Globally-Distributed Database:
- [paper](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)
- [video](https://www.usenix.org/node/170855)
- [ ] [2014: Machine Learning: The High-Interest Credit Card of Technical Debt](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf)
- [ ] [2015: Continuous Pipelines at Google](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43790.pdf)
- [ ] [2015: High-Availability at Massive Scale: Building Googles Data Infrastructure for Ads](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44686.pdf)
- [ ] [2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf )
- [ ] [2015: How Developers Search for Code: A Case Study](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43835.pdf)
- [ ] [2016: Borg, Omega, and Kubernetes](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf)
- ### Testing
- To cover:
- how unit testing works
- what are mock objects
- what is integration testing
- what is dependency injection
- [ ] [Agile Software Testing with James Bach (video)](https://www.youtube.com/watch?v=SAhJf36_u5U)
- [ ] [Open Lecture by James Bach on Software Testing (video)](https://www.youtube.com/watch?v=ILkT_HV9DVU)
- [ ] [Steve Freeman - Test-Driven Development (thats not what we meant) (video)](https://vimeo.com/83960706)
- [slides](http://gotocon.com/dl/goto-berlin-2013/slides/SteveFreeman_TestDrivenDevelopmentThatsNotWhatWeMeant.pdf)
- [ ] [TDD is dead. Long live testing.](http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html)
- [ ] [Is TDD dead? (video)](https://www.youtube.com/watch?v=z9quxZsLcfo)
- [ ] [Video series (152 videos) - not all are needed (video)](https://www.youtube.com/watch?v=nzJapzxH_rE&list=PLAwxTw4SYaPkWVHeC_8aSIbSxE_NXI76g)
- [ ] [Test-Driven Web Development with Python](http://www.obeythetestinggoat.com/pages/book.html#toc)
- [ ] Dependency injection:
- [ ] [video](https://www.youtube.com/watch?v=IKD2-MAkXyQ)
- [ ] [Tao Of Testing](http://jasonpolites.github.io/tao-of-testing/ch3-1.1.html)
- [ ] [How to write tests](http://jasonpolites.github.io/tao-of-testing/ch4-1.1.html)
- ### Scheduling
- in an OS, how it works
- can be gleaned from Operating System videos
- ### Implement system routines
- understand what lies beneath the programming APIs you use
- can you implement them?
- ### String searching & manipulations
- [ ] [Sedgewick - Suffix Arrays (video)](https://www.youtube.com/watch?v=HKPrVm5FWvg)
- [ ] [Sedgewick - Substring Search (videos)](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5)
- [ ] [1. Introduction to Substring Search](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5)
- [ ] [2. Brute-Force Substring Search](https://www.youtube.com/watch?v=CcDXwIGEXYU&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=4)
- [ ] [3. Knuth-Morris Pratt](https://www.youtube.com/watch?v=n-7n-FDEWzc&index=3&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66)
- [ ] [4. Boyer-Moore](https://www.youtube.com/watch?v=fI7Ch6pZXfM&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=2)
- [ ] [5. Rabin-Karp](https://www.youtube.com/watch?v=QzI0p6zDjK4&index=1&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66)
- [ ] [Search pattern in text (video)](https://www.coursera.org/learn/data-structures/lecture/tAfHI/search-pattern-in-text)
If you need more detail on this subject, see "String Matching" section in [Additional Detail on Some Subjects](#additional-detail-on-some-subjects)
---
Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider
when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this.
You can expect system design questions if you have 4+ years of experience
- ### System Design, Scalability, Data Handling
- Considerations from Yegge:
@ -1147,71 +1131,6 @@ You'll get more graph practice in Skiena's book (see Books section below) and th
- [Design a URL-shortener system: copied from above](http://www.hiredintech.com/system-design/the-system-design-process/)
- [Design a cache system](https://www.adayinthelifeof.nl/2011/02/06/memcache-internals/)
- ### Papers
- These are Google papers and well-known papers.
- Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections.
- [ ] [1978: Communicating Sequential Processes](http://spinroot.com/courses/summer/Papers/hoare_1978.pdf)
- [implemented in Go](https://godoc.org/github.com/thomas11/csp)
- [Love classic papers?](https://www.cs.cmu.edu/~crary/819-f09/)
- [ ] [2003: The Google File System](http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)
- replaced by Colossus in 2012
- [ ] [2004: MapReduce: Simplified Data Processing on Large Clusters]( http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)
- mostly replaced by Cloud Dataflow?
- [ ] [2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)](https://www.akkadia.org/drepper/cpumemory.pdf)
- [ ] [2012: Google's Colossus](https://www.wired.com/2012/07/google-colossus/)
- paper not available
- [ ] 2012: AddressSanitizer: A Fast Address Sanity Checker:
- [paper](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf)
- [video](https://www.usenix.org/conference/atc12/technical-sessions/presentation/serebryany)
- [ ] 2013: Spanner: Googles Globally-Distributed Database:
- [paper](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)
- [video](https://www.usenix.org/node/170855)
- [ ] [2014: Machine Learning: The High-Interest Credit Card of Technical Debt](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf)
- [ ] [2015: Continuous Pipelines at Google](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43790.pdf)
- [ ] [2015: High-Availability at Massive Scale: Building Googles Data Infrastructure for Ads](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44686.pdf)
- [ ] [2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf )
- [ ] [2015: How Developers Search for Code: A Case Study](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43835.pdf)
- [ ] [2016: Borg, Omega, and Kubernetes](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf)
- ### Testing
- To cover:
- how unit testing works
- what are mock objects
- what is integration testing
- what is dependency injection
- [ ] [Agile Software Testing with James Bach (video)](https://www.youtube.com/watch?v=SAhJf36_u5U)
- [ ] [Open Lecture by James Bach on Software Testing (video)](https://www.youtube.com/watch?v=ILkT_HV9DVU)
- [ ] [Steve Freeman - Test-Driven Development (thats not what we meant) (video)](https://vimeo.com/83960706)
- [slides](http://gotocon.com/dl/goto-berlin-2013/slides/SteveFreeman_TestDrivenDevelopmentThatsNotWhatWeMeant.pdf)
- [ ] [TDD is dead. Long live testing.](http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html)
- [ ] [Is TDD dead? (video)](https://www.youtube.com/watch?v=z9quxZsLcfo)
- [ ] [Video series (152 videos) - not all are needed (video)](https://www.youtube.com/watch?v=nzJapzxH_rE&list=PLAwxTw4SYaPkWVHeC_8aSIbSxE_NXI76g)
- [ ] [Test-Driven Web Development with Python](http://www.obeythetestinggoat.com/pages/book.html#toc)
- [ ] Dependency injection:
- [ ] [video](https://www.youtube.com/watch?v=IKD2-MAkXyQ)
- [ ] [Tao Of Testing](http://jasonpolites.github.io/tao-of-testing/ch3-1.1.html)
- [ ] [How to write tests](http://jasonpolites.github.io/tao-of-testing/ch4-1.1.html)
- ### Scheduling
- in an OS, how it works
- can be gleaned from Operating System videos
- ### Implement system routines
- understand what lies beneath the programming APIs you use
- can you implement them?
- ### String searching & manipulations
- [ ] [Sedgewick - Suffix Arrays (video)](https://www.youtube.com/watch?v=HKPrVm5FWvg)
- [ ] [Sedgewick - Substring Search (videos)](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5)
- [ ] [1. Introduction to Substring Search](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5)
- [ ] [2. Brute-Force Substring Search](https://www.youtube.com/watch?v=CcDXwIGEXYU&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=4)
- [ ] [3. Knuth-Morris Pratt](https://www.youtube.com/watch?v=n-7n-FDEWzc&index=3&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66)
- [ ] [4. Boyer-Moore](https://www.youtube.com/watch?v=fI7Ch6pZXfM&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=2)
- [ ] [5. Rabin-Karp](https://www.youtube.com/watch?v=QzI0p6zDjK4&index=1&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66)
- [ ] [Search pattern in text (video)](https://www.coursera.org/learn/data-structures/lecture/tAfHI/search-pattern-in-text)
If you need more detail on this subject, see "String Matching" section in [Additional Detail on Some Subjects](#additional-detail-on-some-subjects)
---
## Final Review
@ -1678,6 +1597,115 @@ You're never really done.
- ### Augmented Data Structures
- [ ] [CS 61B Lecture 39: Augmenting Data Structures](https://youtu.be/zksIj9O8_jc?list=PL4BBB74C7D2A1049C&t=950)
- ### Tries
- Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits
to track the path.
- I read through code, but will not implement.
- [ ] [Sedgewick - Tries (3 videos)](https://www.youtube.com/playlist?list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ)
- [ ] [1. R Way Tries](https://www.youtube.com/watch?v=buq2bn8x3Vo&index=3&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ)
- [ ] [2. Ternary Search Tries](https://www.youtube.com/watch?v=LelV-kkYMIg&index=2&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ)
- [ ] [3. Character Based Operations](https://www.youtube.com/watch?v=00YaFPcC65g&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ&index=1)
- [ ] [Notes on Data Structures and Programming Techniques](http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html#Tries)
- [ ] Short course videos:
- [ ] [Introduction To Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/08Xyf/core-introduction-to-tries)
- [ ] [Performance Of Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/PvlZW/core-performance-of-tries)
- [ ] [Implementing A Trie (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/DFvd3/core-implementing-a-trie)
- [ ] [The Trie: A Neglected Data Structure](https://www.toptal.com/java/the-trie-a-neglected-data-structure)
- [ ] [TopCoder - Using Tries](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/)
- [ ] [Stanford Lecture (real world use case) (video)](https://www.youtube.com/watch?v=TJ8SkcUSdbU)
- [ ] [MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)](https://www.youtube.com/watch?v=NinWEPPrkDQ&index=16&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf)
- ### Balanced search trees
- Know least one type of balanced binary tree (and know how it's implemented):
- "Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular.
A particularly interesting self-organizing data structure is the splay tree, which uses rotations
to move any accessed key to the root." - Skiena
- Of these, I chose to implement a splay tree. From what I've read, you won't implement a
balanced search tree in your interview. But I wanted exposure to coding one up
and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code.
- splay tree: insert, search, delete functions
If you end up implementing red/black tree try just these:
- search and insertion functions, skipping delete
- I want to learn more about B-Tree since it's used so widely with very large data sets.
- [ ] [Self-balancing binary search tree](https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree)
- [ ] **AVL trees**
- In practice:
From what I can tell, these aren't used much in practice, but I could see where they would be:
The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly
balanced than redblack trees, leading to slower insertion and removal but faster retrieval. This makes it
attractive for data structures that may be built once and loaded without reconstruction, such as language
dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter).
- [ ] [MIT AVL Trees / AVL Sort (video)](https://www.youtube.com/watch?v=FNeL18KsWPc&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=6)
- [ ] [AVL Trees (video)](https://www.coursera.org/learn/data-structures/lecture/Qq5E0/avl-trees)
- [ ] [AVL Tree Implementation (video)](https://www.coursera.org/learn/data-structures/lecture/PKEBC/avl-tree-implementation)
- [ ] [Split And Merge](https://www.coursera.org/learn/data-structures/lecture/22BgE/split-and-merge)
- [ ] **Splay trees**
- In practice:
Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors,
data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory,
networking, and file system code) etc.
- [ ] [CS 61B: Splay Trees (video)](https://www.youtube.com/watch?v=Najzh1rYQTo&index=23&list=PL-XXv-cvA_iAlnI-BQr9hjqADPBtujFJd)
- [ ] MIT Lecture: Splay Trees:
- Gets very mathy, but watch the last 10 minutes for sure.
- [Video](https://www.youtube.com/watch?v=QnPl_Y6EqMo)
- [ ] **Red/black trees**
- these are a translation of a 2-3 tree (see below)
- In practice:
Redblack trees offer worst-case guarantees for insertion time, deletion time, and search time.
Not only does this make them valuable in time-sensitive applications such as real-time applications,
but it makes them valuable building blocks in other data structures which provide worst-case guarantees;
for example, many data structures used in computational geometry can be based on redblack trees, and
the Completely Fair Scheduler used in current Linux kernels uses redblack trees. In the version 8 of Java,
the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor
hashcodes, a Red-Black tree is used.
- [ ] [Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)](https://youtu.be/1W3x0f_RmUo?list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&t=3871)
- [ ] [Aduni - Algorithms - Lecture 5 (video)](https://www.youtube.com/watch?v=hm2GHwyKF1o&list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&index=5)
- [ ] [Black Tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)
- [ ] [An Introduction To Binary Search And Red Black Tree](https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/)
- [ ] **2-3 search trees**
- In practice:
2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees).
- You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees.
- [ ] [23-Tree Intuition and Definition (video)](https://www.youtube.com/watch?v=C3SsdUqasD4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=2)
- [ ] [Binary View of 23-Tree](https://www.youtube.com/watch?v=iYvBtGKsqSg&index=3&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [2-3 Trees (student recitation) (video)](https://www.youtube.com/watch?v=TOb1tuEZ2X4&index=5&list=PLUl4u3cNGP6317WaSNfmCvGym2ucw3oGp)
- [ ] **2-3-4 Trees (aka 2-4 trees)**
- In practice:
For every 2-4 tree, there are corresponding redblack trees with data elements in the same order. The insertion and deletion
operations on 2-4 trees are also equivalent to color-flipping and rotations in redblack trees. This makes 2-4 trees an
important tool for understanding the logic behind redblack trees, and this is why many introductory algorithm texts introduce
2-4 trees just before redblack trees, even though **2-4 trees are not often used in practice**.
- [ ] [CS 61B Lecture 26: Balanced Search Trees (video)](https://www.youtube.com/watch?v=zqrqYXkth6Q&index=26&list=PL4BBB74C7D2A1049C)
- [ ] [Bottom Up 234-Trees (video)](https://www.youtube.com/watch?v=DQdMYevEyE4&index=4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [Top Down 234-Trees (video)](https://www.youtube.com/watch?v=2679VQ26Fp4&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=5)
- [ ] **N-ary (K-ary, M-ary) trees**
- note: the N or K is the branching factor (max branches)
- binary trees are a 2-ary tree, with branching factor = 2
- 2-3 trees are 3-ary
- [ ] [K-Ary Tree](https://en.wikipedia.org/wiki/K-ary_tree)
- [ ] **B-Trees**
- fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor)
- In Practice:
B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to
its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary
block in a particular file. The basic problem is turning the file block i address into a disk block
(or perhaps to a cylinder-head-sector) address.
- [ ] [B-Tree](https://en.wikipedia.org/wiki/B-tree)
- [ ] [Introduction to B-Trees (video)](https://www.youtube.com/watch?v=I22wEC1tTGo&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6&index=6)
- [ ] [B-Tree Definition and Insertion (video)](https://www.youtube.com/watch?v=s3bCdZGrgpA&index=7&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [B-Tree Deletion (video)](https://www.youtube.com/watch?v=svfnVhJOfMc&index=8&list=PLA5Lqm4uh9Bbq-E0ZnqTIa8LRaL77ica6)
- [ ] [MIT 6.851 - Memory Hierarchy Models (video)](https://www.youtube.com/watch?v=V3omVLzI0WE&index=7&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf)
- covers cache-oblivious B-Trees, very interesting data structures
- the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
- ### k-D Trees
- great for finding number of points in a rectangle or higher dimension object
- a good fit for k-nearest neighbors