140 KiB
Coding Interview University
Ursprünglich habe ich dies als eine kurze To-Do Liste von Studienthemen angefangen um Software Engineer zu werden, aber es ist zu der riesigen Liste herangewachsen die man heute sehen kann. Nachdem ich diesen Lehrplan durchgezogen habe, wurde ich als Software Entwickler bei Amazon eingestellt.! Wahrscheinlich wirst du nicht so viel lernen müssen wie ich. Aber egal, alles was man brauchst, findest man hier.
Ich habe ungefähr 8-12 Stunden am Tag gelernt, und das für mehrere Monate. Hier ist meine Geschichte: Why I studied full-time for 8 months for a Google interview
Die Einträge in dieser Liste werden dich gut auf ein Vorstellungsgepräch bei so gut wie jeder Software Firma vorbereiten, so bei den Giganten: Amazon, Facebook, Google oder Micrososft.
Viel Glück!
Übersetzungen:
Übersetzungen in Bearbeitung:
Worum es geht
Das ist mein mehrmonatiger Lernplan um vom Web Developer (Selbststudium, kein Abschluss in Informatik) zum Software Engineer bei einer großen Firma zu wechseln.
Dies ist gedacht für neue Software Engineure oder solche die von der Software/Web Entwicklung zum Software Engineering wechseln wollen (wobei Informatikkenntnisse benötigt werden). Falls man behauptet mehrere Jahre an Erfahrung als Software Engineer zu haben, erwartet einen ein hartes Vorstellungsgespräch.
Falls du schon mehrere Jahre Erfahung in der Software/Webenteicklung hast, muss dir klar sein, dass große Software Unternehmen wie Google, Amazon, Facebook oder Microsoft Software Engineering und Software Entwicklung als unterschiedliche Dinge ansehen, und sie setzen Informatikkenntnisse voraus.
Falls du ein Reliability Engineer oder Operations Engineer werden möchtest, solltest du dir besonders die optionale Liste (Netzwerke, Sicherheit) ansehen.
Inhaltsverzeichnis
- Worum es geht
- Warum solltest du das hier lesen?
- Wie man dies hier benutzt
- Halt dich nicht für dümmer als du bist
- Über Videoquellen
- Ablauf von Vorstellungsgesprächen und allgemeine Vorbereitung darauf
- Wähle eine Sprache für das Vorstellungsgespräch
- Buchliste
- Bevor du anfängst
- Was hier nicht behandelt wird
- Voraussetzungen
- Der Tagesplan
- Komplexitätstheorie / Big-O (Groß-O Notation) / Asymptotische Analyse
- Datenstrukturen
- Sonstiges
- Trees (Bäume)
- Trees - Notizen und Hintergrund
- Binärer Suchbaum
- Heap / Vorrangwarteschlange / Binärer Heap
- balancierte Suchbäume (allgemeines Konzept, keine Details)
- Traversierung: preorder, inorder, postorder, Breitensuche, Tiefensuche
- Sortierung
- Auswahl
- Insertion Sort
- Heap Sort
- Quick Sort
- Merge Sort
- Graphen
- gerichtet
- ungerichtet
- Adjazenzmatrix
- Adjazenzliste
- Traversierung: Breitensuche, Tiefensuche
- Sonstiges
- Rekursion
- Dynamische Programmierung
- Object-orientierte Programmierung
- Design Patterns (Entwurfsmuster)
- Kombinatorik (n über k) und Wahrscheinlichkeiten
- NP, NP-Vollständig und Heuristiken
- Caches
- Proczsse und Threads
- Testen
- Scheduling
- Stringsuche und -manipulationen
- Tries (Präfixbäume)
- Fließkommazahlen
- Unicode
- Byte-Reihenfolge
- Netzwerke
- Systementwurf, Skalierbarkeit, Datenverarbeitung (if you have 4+ years experience)
- Abschließende Prüfung
- Coding Fragen Übung
- Programmieraufgaben/Wettbewerbe
- Wenn das Vorstellungsgespräch bald ansteht
- Dein Lebenslauf
- Denk dran wenn das Vorstellungsgespräch kommt
- Stell fragen an den Interviewer
- Wenn du den Job bekommst
---------------- Alles unter der Linie ist optional ----------------
Zusätzliche Materialien
- Zusätzliche Bücher
- Zusätzliches Wissen
- Compilers (Übersetzer)
- Emacs und vi(m)
- Unix Kommandozeilenwerkzeuge
- Informationstheorie
- Parität und Hamming Code
- Entropie
- Kryptographie
- Kompression
- Sicherheit
- Garbage collection (automatische Speicherverwaltung)
- Parallelisierung
- Messaging, Serialisierung und Queueing Systems
- A*
- Fast Fourier Transform
- Bloom Filter
- HyperLogLog
- Locality-Sensitive Hashing
- van Emde Boas Trees
- Augmentierte Datenstrukturen
- Balancierte Suchbäume
- AVL Bäume
- Splay Bäume
- Rot-Schwarz-Bäume
- 2-3 Suchbäume
- 2-3-4 Bäume (aka 2,4 Bäume)
- N-fache (K-fache, M-fache) Bäume
- B-Bäume
- k-D Bäume
- Skip-Listen
- Netwerk Flüsse und Schnitte
- Disjunkte Mengen & Union Find
- Mathematik für schnelle Berechnungen
- Treap
- Lineare Programmierung
- Geometrie, Konvexe Hülle
- Diskrete Mathematik
- Machine Learning (maschinelles Lernen)
- Weitere Details für ausgewählte Themen
- Videoreihen
- Infomatikkurse
- Paper (Wissenschaftliche Artikel)
Warum solltest du das hier lesen?
Als ich dieses Projekt angefangen habe, konnte ich einen Stack nicht von einem Heap unterscheiden, wusste nichts von Groß-O, nichts über Bäume, oder wie man einen Graphen durchläuft. Wenn ich einen Sortieralgorithmuss hätte schreiben sollen, dann wäre der nicht besonders gut geworden, so viel kann ich dir sagen. Jede Datenstruktur die jemals benutzt habe war direkt in der Programmiersprache eingebaut, und ich hatte keine Ahnung wie sie funktioniert haben. Ich muss niemals Speichermanagement betreiben, außer einer der Prozesse die ich ausgeführt hatte hat einen "out of memory" Fehler gehabt. Und wenn das passiert ist, musste ich einen Umweg finden. Ich habe ein paar mehrdimensionale Arrays in meinen Leben benutzt und ein paar tausend assoziative Arrays, aber ich habe nie selbst eine Datenstruktur von Grund auf neu geschrieben.
Es ist ein großer Plan. Es könnte mehrere Monate dauern. Falls dir schon vieles von dem bekannt ist, wird es dich viel weniger Zeit kosten.
Wie man dies hier benutzt
Wie man dies hier benutzt
Alles hier drunter ist ein Umriss, und du solltest die Aufgaben von oben nach untern abarbeiten.
Ich benutze GitHub's spezielle Version von Markdown, das beinhält Aufgabenliste um den Fortschritt zu prüfen.
Erstelle einen neuen Branch. Damit du Einträge abhaken kannst, füge einfach nur ein x in eckigen Klammern ein: [x]
Erstelle einen Fork dieses Projekts und gib die folgenden Kommandos ein
git checkout -b progress
git remote add jwasham https://github.com/jwasham/coding-interview-university
git fetch --all
Hake alle Kästchen mit x ab nachdem du die Änderungen vollzogen hast
git add .
git commit -m "Marked x"
git rebase jwasham/master
git push --force
Halt dich nicht für dümmer als du bist
- Erfolgreiche Software Engineers sind klug, aber viele sind sich unsicher ob sie klug genug sind.
- The myth of the Genius Programmer
- It's Dangerous to Go Alone: Battling the Invisible Monsters in Tech
- Believe you can change
- Think you're not smart enough to work at Google? Well, think again
Über Videoquellen
Auf manche Videos kann man nur zugreifen indem man sich bei einem Coursera- oder EdX-Kurs einschreibt. Das sind so genannte MOOCS. Manchmal werden die Kurse gerade nicht angeboten und man muss ein paar Monate warten. Man hat dann keinen Zugriff darauf.
Ich würde mich sehr freuen wenn du mir dabei hilfst kostenlose und immer verfügbare öffentliche Quellen hinzuzufügen,
wie z.B. YouTube Videos um die Online Kurse zu ergänzen.
Ich benutze gerne Vorlesungen von Hochschulen.
Ablauf von Vorstellungsgesprächen und allgemeine Vorbereitung darauf
Ablauf von Vorstellungsgesprächen und allgemeine Vorbereitung darauf
-
Cracking The Coding Interview Teil 1:
-
Wie man einen Job bei den Großen 4 bekommt:
-
Vorbereitungskurse:
- Software Engineer Interview Unleashed (kostenpflichtiger Kurs):
- Hier lernt von einem ehemaligen Google Interviewer wie man sich auf ein Vorstellungsgespräch als Software Engineer vorbereitet.
- Python for Data Structures, Algorithms, and Interviews! (kostenpflichtiger Kurs):
- Ein auf Python zugeschnittener Kurs welcher Datenstrukturen, Algorithme, Testinterviews und noch viel mehr behandelt.
- Intro to Data Structures and Algorithms using Python! (kostenloser Kurs auf Udacity):
- Ein kostenloser auf Python zentrierter Kurs über Datenstrukturen und Algorithmen.
- Data Structures and Algorithms Nanodegree! (kostenpflichtiges Nandegree Kurs auf Udacity):
- Hol dir praktische Erfahrungen im Umgang mit über 100 Datenstrukturen und Algorithmen unter der Führung eines engagierten Mentors der dir dabei hilft dich auf Vorstellungsgespräche und Beispiele aus den Berufsleben vorzubereiten.
- Software Engineer Interview Unleashed (kostenpflichtiger Kurs):
Wähle eine Sprache für das Vorstellungsgespräch
Man sollte eine Sprache wählen mit der man sich wohl fühlt beim Codingteil des Vorstellungsgesprächs. Aber für große Firmen sind das valide Optionen:
- C++
- Java
- Python
Man könnte auch diese verwenden, aber pass auf. Es könnte eineige Vorbehalte geben:
- JavaScript
- Ruby
Hier ist ein Artikel den ich über die Auswahl der Programmiersprache für das Vorstellungsgespräch geschrieben habe: Pick One Language for the Coding Interview
Du musst dich mit der Sprache wohl fühlen und auskennen.
Hier kannst du mehr über die Wahl lesen:
- http://www.byte-by-byte.com/choose-the-right-language-for-your-coding-interview/
- http://blog.codingforinterviews.com/best-programming-language-jobs/
Unten sind ein paar Materialien zu C, C++ und Python zu finden, weil ich das gerade lerne. Es gehören einige Bücher dazu, siehe unten.
Buchliste
Die Liste ist kürzer als die, die ich tatsächlich benutzt habe. Ich habe es etwas abgekürzt um euch Zeit zu sparen.
Vorbereitung auf das Vorstellungsgespräch
- Programming Interviews Exposed: Coding Your Way Through the Interview, 4nd Edition
- Antworten in C++ und Java
- eine gute Aufwärmübung für Cracking the Coding Interview
- nicht allzu schwer, die meisten Probelem sind einfacher als das was ihr in Vorstellungsgesprächen sehen werdet (von dem was ich so gelesen habe)
- Cracking the Coding Interview, 6th Edition
- Antworten in Java
Wenn man extrem viel Zeit hat:
Such dir eins aus:
- Elements of Programming Interviews (C++ version)
- Elements of Programming Interviews (Java version)
Rechnerarchitektur
- Write Great Code: Volume 1: Understanding the Machine
-
Das Buch wurde 2004 veröffentlicht und ist etwas veraltet, aber es ist eine hervorragende Quelle um Computer in Kürze zu verstehen.
-
Der Autor hat HLA erfunden, also sollte man die Erwähnungen und Beispiele in HLA mit Vorsicht genießen. Nicht weit verbreitet, aber ein nettes Beispiel wie Assembly Code aussehen kann.
-
Diese Kapitel sind es wert zu lesen um euch eine gute Grundlage zu geben:
......
- Kapitel 2 - Numeric Representation
- Kapitel 3 - Binary Arithmetic and Bit Operations
- Kapitel 4 - Floating-Point Representation
- Kapitel 5 - Character Representation
- Kapitel 6 - Memory Organization and Access
- Kapitel 7 - Composite Data Types and Memory Objects
- Kapitel 9 - CPU Architecture
- Kapitel 10 - Instruction Set Architecture
- Kapitel 11 - Memory Architecture and Organization
-
Sprachspezifisch
Man muss sich für das Vorstellungsgespräch für eine Programmiersprache entschieden haben (siehe oben).
Hier sind meine Empfehlungen geordnet nach Sprache. Ich habe nicht für alle Sprachen Material. Ich begrüße Ergänzugen.
Wenn du dich durch eins davon durchgelsen hast, solltest du genügende Wissen über Datenstrukturen und Algorithmen haben um Coding Probleme lösen zu können. Man kann alle Videolektionen in diesen Projekt überspringen, außer du willst eine Auffrischung.
Zusätzliches sprachspezifisches Material hier.
C++
C++
Ich habe diese beiden zwar nicht gelesen, aber sie sind gut bewertet und von Sedgewick geschrieben. Er ist super.
- Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching
- Algorithms in C++ Part 5: Graph Algorithms
Wenn du eine bessere Empfehlung für C++ hast, bitte lass es mich wissen. Ich suche nach umfassenden Material.
Java
Java
- Algorithms (Sedgewick und Wayne)
- Videos mit Buchinhalt (und Sedgewick!) auf Coursera:
ODER:
- Data Structures and Algorithms in Java
- von Goodrich, Tamassia, Goldwasser
- wird bei der UC Berkeley als Zusatzmaterial für den Informatik Einstieg benutzt
- siehe Zusammenfassung zur Python Version, dieses Buch behandelt die selben Themen.
Python
Python
- Data Structures and Algorithms in Python
- von Goodrich, Tamassia, Goldwasser
- Ich habe dieses Buch gelibet. Es hat alles behandelt und mehr.
- Python-hafter Code
- meine feurige Rezension: https://startupnextdoor.com/book-report-data-structures-and-algorithms-in-python/
Bevor du anfängst
Diese Liste ist über mehrere Monate gewachsen. Und ja, sie ist etwas aus dem Ruder gelaufen.
Hier einige Fehler die ich gemacht habe, damit ihr ein besseres Erlebnis haben könnt.
1. Du wirst dich nicht an alles erinnern können
Ich habe stundelang Videos gesehen und reichlich Notizen geschrieben. Monate später gab es viel an das ich mich nicht mehr erinnern konnte. Ich habe 3 Tage damit verbracht meine Notizen durchzugehen und daraus Lernkarten zu erstellen, damit ich alles nochmal wiederholen konnte.
Bitte lesen damit ihr nicht meine Fehler wiederholt:
Retaining Computer Science Knowledge
2. Benutz Lernkarten
Um das Problem zu lösen, habe ich eine kleine Webseite erstellt wo ich 2 Arten von Lernkarten anlegen kann: Allgemein und Code. Jede Karte hat ihr eigenes Format.
Ich habe eine mobile-first Webseite gemacht, damit ich auf meinen Smart Phone oder Tablet lernen kann, egal wo ich mich befinde.
Erstell kostenlos deine eigenen Lernkarten:
- Lernkarten-Webseiten Repo
- Meine Lernkarten Databank (alt - 1200 Karten):
- Meine Lernkarten Databank (neu - 1800 Karten):
Achtung, ich habe es übertrieben und Lernkarten über alles erstellt, von Assembly und Python Trivia bis hin zu Machine Learning und Statistik. Das ist viel mehr als eigentlich notwendig.
Anmerkung zu Lernkarten: Wenn man sich einmal an eine Antwort erinnert, sollte man das nicht als Wissen ansehen. Man muss sich die Karte mehrmals ansehen und richtig beantworten bevor man es tatsächlich weiß. Wiederholung wird das Wissen tiefer in euren Verstand verankern.
Eine Alternative zu Lernkarten ist Anki, was mir schon öfters empfohlen wurde. Es benutzt ein Erinnerungssystem um einen dabei zu helfen sich zu erinnern. Es ist benutzerfreundlich, auf allen Plattformen erhaältlich und kann sich mit der Cloud synchronisieren. Es kostet 25$ auf iOS aber es ist kostenlos für andere Plattformen.
Meine Lernkarten Sammlung im Anki Format: https://ankiweb.net/shared/info/25173560 (Danke @xiewenya)
3. Wiederholen, wiederholen , wiederholen
Ich behalte eine Reihe von Spickzetteln über ASCII, OSI Stack, Groß-O Notation, und mehr. Ich lerne sie in meiner Freizeit.
Nimm dir eine Pause vom Programmieren für eine halbe Stunde und gehe deine Lernkarten durch.
4. Fokus
Es gibt eine Menge Ablenkungen die dir deine kostbare Zeit stehlen können. Fokussiert und konzentriert zu bleiben ist schwer.
Was hier nicht behandelt wird
Das sind weit verbreitete Technologien aber nicht Teil des Lehrplans:
- SQL
- Javascript
- HTML, CSS, und andere Front-end Technologien
Der Tagesplan
Einige der Themen brauchen einen Tag, für andere braucht man mehrere Tage. Manche sind reines Lernen ohne das man was implementiert.
Jeden Tag nehme ich mir ein Thema aus der Liste unten vor, schaue Videos über das Thema, und schreibe eine Implementierung in:
- C - mit structs and Funktionen die ein struct Pointer und und etwas anderes als Argumente benutzen.
- C++ - ohne vorgefertigte Typen
- C++ - mit vorgefertigte Typen, wie STL's std::list für verkettete Listen
- Python - mit vorgefertigte Typen (um Python weiterhin zu üben)
- und ich schreibe Tests um sicher zu gehen, dass ich richitg liege, manchmal sind das nur einfache assert() Statements
- Du könntest auch Java oder etwas anderes machen, das ist nur das was ich tue.
Man brauchst nicht alles davon. Man braucht nur eine Sprache für das Vorstellungsgepräch.
Warum ich in all diesen Sprachen programmiere?
- Üben, üben, üben, bis ich kotzen muss und es im Schlaf beherrsche (manche Probleme haben viele Sonderfälle und Wissen an das man sich erinnern muss)
- Unter erschwerten Voraussetzungen arbeiten können (Speicher allokieren/freigeben ohne die Hilfe einer Garbage Collection (Ausnahmen sind Python oder Java))
- Vorgefertigte Typen verwenden damit ich Erfahrung im Umgang für echte Anwendungsfälle haben (ich werde sich meine eigene verkettete Liste im Alltag implementieren)
Vielleicht habe ich nicht die Zeit um das alles für jedes Thema zu machen, aber ich werde es versuchen.
Man findet meinen Code hier:
Man muss sich nicht bei jeden Algorithmus an alle Details erinnern können.
Schreib Code auf einer Tafel oder auf Papier, aber nicht am Computer. Teste mit ein paar einfachen Eingaben. Dann kannst du es am Computer testen.
Voraussetzungen
Voraussetzungen
-
Lerne C
- C ist überall. Du wirst Beispiele in Büchern, Vorlesungen, Videos, und generell überall finden während du lernst.
- C Programming Language, Vol 2
- Das ist ein kurzes Buch, aber es wird dich viel über die C Sprache lehren und wenn du ein bisschen übst, wirst du schnell darin bewandert sein. C zu Verstehen hilft dir zu verstehen wie Programme und Speicher funktionieren.
- Antworten auf Fragen
-
Wie Computer einen Prozess ausführen:
Algorithmic complexity / Big-O / Asymptotic analysis
Algorithmic complexity / Big-O / Asymptotic analysis
- Nothing to implement
- There are a lot of videos here. Just watch enough until you understand it. You can always come back and review.
- If some of the lectures are too mathy, you can jump down to the bottom and watch the discrete mathematics videos to get the background knowledge.
- Harvard CS50 - Asymptotic Notation (video)
- Big O Notations (general quick tutorial) (video)
- Big O Notation (and Omega and Theta) - best mathematical explanation (video)
- Skiena:
- A Gentle Introduction to Algorithm Complexity Analysis
- Orders of Growth (video)
- Asymptotics (video)
- UC Berkeley Big O (video)
- UC Berkeley Big Omega (video)
- Amortized Analysis (video)
- Illustrating "Big O" (video)
- TopCoder (includes recurrence relations and master theorem):
- Cheat sheet
Datenstrukturen
Datenstrukturen
-
Arrays
- Implement an automatically resizing vector.
- Description:
- Implement a vector (mutable array with automatic resizing):
- Practice coding using arrays and pointers, and pointer math to jump to an index instead of using indexing.
- new raw data array with allocated memory
- can allocate int array under the hood, just not use its features
- start with 16, or if starting number is greater, use power of 2 - 16, 32, 64, 128
- size() - number of items
- capacity() - number of items it can hold
- is_empty()
- at(index) - returns item at given index, blows up if index out of bounds
- push(item)
- insert(index, item) - inserts item at index, shifts that index's value and trailing elements to the right
- prepend(item) - can use insert above at index 0
- pop() - remove from end, return value
- delete(index) - delete item at index, shifting all trailing elements left
- remove(item) - looks for value and removes index holding it (even if in multiple places)
- find(item) - looks for value and returns first index with that value, -1 if not found
- resize(new_capacity) // private function
- when you reach capacity, resize to double the size
- when popping an item, if size is 1/4 of capacity, resize to half
- Zeit
- O(1) to add/remove at end (amortized for allocations for more space), index, or update
- O(n) to insert/remove elsewhere
- Speicher
- contiguous in memory, so proximity helps performance
- space needed = (array capacity, which is >= n) * size of item, but even if 2n, still O(n)
-
Linked Lists
- Description:
- C Code (video) - not the whole video, just portions about Node struct and memory allocation.
- Linked List vs Arrays:
- why you should avoid linked lists (video)
- Gotcha: you need pointer to pointer knowledge: (for when you pass a pointer to a function that may change the address where that pointer points) This page is just to get a grasp on ptr to ptr. I don't recommend this list traversal style. Readability and maintainability suffer due to cleverness.
- implement (I did with tail pointer & without):
- size() - returns number of data elements in list
- empty() - bool returns true if empty
- value_at(index) - returns the value of the nth item (starting at 0 for first)
- push_front(value) - adds an item to the front of the list
- pop_front() - remove front item and return its value
- push_back(value) - adds an item at the end
- pop_back() - removes end item and returns its value
- front() - get value of front item
- back() - get value of end item
- insert(index, value) - insert value at index, so current item at that index is pointed to by new item at index
- erase(index) - removes node at given index
- value_n_from_end(n) - returns the value of the node at nth position from the end of the list
- reverse() - reverses the list
- remove_value(value) - removes the first item in the list with this value
- Doubly-linked List
- Description (video)
- gibt keinen Grund das zu implementieren
-
Stack
- Stacks (video)
- Using Stacks Last-In First-Out (video)
- Werde ich nicht implementieren. Implementierung mittels Array ist trivial.
-
Queue
- Using Queues First-In First-Out(video)
- Queue (video)
- Circular buffer/FIFO
- Priority Queues (video)
- Implement using linked-list, with tail pointer:
- enqueue(value) - adds value at position at tail
- dequeue() - returns value and removes least recently added element (front)
- empty()
- Implement using fixed-sized array:
- enqueue(value) - adds item at end of available storage
- dequeue() - returns value and removes least recently added element
- empty()
- full()
- Cost:
- a bad implementation using linked list where you enqueue at head and dequeue at tail would be O(n) because you'd need the next to last element, causing a full traversal each dequeue
- enqueue: O(1) (amortized, linked list and array [probing])
- dequeue: O(1) (linked list and array)
- empty: O(1) (linked list and array)
-
Hash table
-
Videos:
-
Online Kurse:
-
implement with array using linear probing
- hash(k, m) - m is size of hash table
- add(key, value) - if key already exists, update value
- exists(key)
- get(key)
- remove(key)
-
More Knowledge
More Knowledge
-
Binary search
- Binary Search (video)
- Binary Search (video)
- detail
- Implement:
- binary search (on sorted array of integers)
- binary search using recursion
-
Bitwise operations
- Bits cheat sheet - you should know many of the powers of 2 from (2^1 to 2^16 and 2^32)
- Get a really good understanding of manipulating bits with: &, |, ^, ~, >>, <<
- 2s and 1s complement
- count set bits
- round to next power of 2:
- swap values:
- absolute value:
Trees
Trees
-
Trees - Notes & Background
- Series: Core Trees (video)
- Series: Trees (video)
- basic tree construction
- traversal
- manipulation algorithms
- BFS(breadth-first search) and DFS(depth-first search) (video)
- BFS notes:
- level order (BFS, using queue)
- time complexity: O(n)
- space complexity: best: O(1), worst: O(n/2)=O(n)
- DFS notes:
- time complexity: O(n)
- space complexity: best: O(log n) - avg. height of tree worst: O(n)
- inorder (DFS: left, self, right)
- postorder (DFS: left, right, self)
- preorder (DFS: self, left, right)
- BFS notes:
-
Binary search trees: BSTs
- Binary Search Tree Review (video)
- Series (video)
- starts with symbol table and goes through BST applications
- Introduction (video)
- MIT (video)
- C/C++:
- Binary search tree - Implementation in C/C++ (video)
- BST implementation - memory allocation in stack and heap (video)
- Find min and max element in a binary search tree (video)
- Find height of a binary tree (video)
- Binary tree traversal - breadth-first and depth-first strategies (video)
- Binary tree: Level Order Traversal (video)
- Binary tree traversal: Preorder, Inorder, Postorder (video)
- Check if a binary tree is binary search tree or not (video)
- Delete a node from Binary Search Tree (video)
- Inorder Successor in a binary search tree (video)
- Implement:
- insert // insert value into tree
- get_node_count // get count of values stored
- print_values // prints the values in the tree, from min to max
- delete_tree
- is_in_tree // returns true if given value exists in the tree
- get_height // returns the height in nodes (single node's height is 1)
- get_min // returns the minimum value stored in the tree
- get_max // returns the maximum value stored in the tree
- is_binary_search_tree
- delete_value
- get_successor // returns next-highest value in tree after given value, -1 if none
-
Heap / Priority Queue / Binary Heap
- visualized as a tree, but is usually linear in storage (array, linked list)
- Heap
- Introduction (video)
- Naive Implementations (video)
- Binary Trees (video)
- Tree Height Remark (video)
- Basic Operations (video)
- Complete Binary Trees (video)
- Pseudocode (video)
- Heap Sort - jumps to start (video)
- Heap Sort (video)
- Building a heap (video)
- MIT: Heaps and Heap Sort (video)
- CS 61B Lecture 24: Priority Queues (video)
- Linear Time BuildHeap (max-heap)
- Implement a max-heap:
- insert
- sift_up - needed for insert
- get_max - returns the max item, without removing it
- get_size() - return number of elements stored
- is_empty() - returns true if heap contains no elements
- extract_max - returns the max item, removing it
- sift_down - needed for extract_max
- remove(i) - removes item at index x
- heapify - create a heap from an array of elements, needed for heap_sort
- heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap
- note: using a min heap instead would save operations, but double the space needed (cannot do in-place).
Sorting
Sorting
-
Notes:
- Implement sorts & know best case/worst case, average complexity of each:
- no bubble sort - it's terrible - O(n^2), except when n <= 16
- stability in sorting algorithms ("Is Quicksort stable?")
- Which algorithms can be used on linked lists? Which on arrays? Which on both?
- I wouldn't recommend sorting a linked list, but merge sort is doable.
- Merge Sort For Linked List
- Implement sorts & know best case/worst case, average complexity of each:
-
For heapsort, see Heap data structure above. Heap sort is great, but not stable.
-
UC Berkeley:
-
Merge Sort code:
-
Quick Sort code:
-
Implement:
- Mergesort: O(n log n) average and worst case
- Quicksort O(n log n) average case
- Selection sort and insertion sort are both O(n^2) average and worst case
- For heapsort, see Heap data structure above.
-
Not required, but I recommended them:
As a summary, here is a visual representation of 15 sorting algorithms. If you need more detail on this subject, see "Sorting" section in Additional Detail on Some Subjects
Graphen
Graphen
Graphs can be used to represent many problems in computer science, so this section is long, like trees and sorting were.
-
Bemerkungen:
- There are 4 basic ways to represent a graph in memory:
- objects and pointers
- adjacency matrix
- adjacency list
- adjacency map
- Familiarize yourself with each representation and its pros & cons
- BFS and DFS - know their computational complexity, their tradeoffs, and how to implement them in real code
- When asked a question, look for a graph-based solution first, then move on if none.
- There are 4 basic ways to represent a graph in memory:
-
MIT(videos):
-
Skiena Lectures - great intro:
- CSE373 2012 - Lecture 11 - Graph Data Structures (video)
- CSE373 2012 - Lecture 12 - Breadth-First Search (video)
- CSE373 2012 - Lecture 13 - Graph Algorithms (video)
- CSE373 2012 - Lecture 14 - Graph Algorithms (con't) (video)
- CSE373 2012 - Lecture 15 - Graph Algorithms (con't 2) (video)
- CSE373 2012 - Lecture 16 - Graph Algorithms (con't 3) (video)
-
Graphs (review and more):
- 6.006 Single-Source Shortest Paths Problem (video)
- 6.006 Dijkstra (video)
- 6.006 Bellman-Ford (video)
- 6.006 Speeding Up Dijkstra (video)
- Aduni: Graph Algorithms I - Topological Sorting, Minimum Spanning Trees, Prim's Algorithm - Lecture 6 (video)
- Aduni: Graph Algorithms II - DFS, BFS, Kruskal's Algorithm, Union Find Data Structure - Lecture 7 (video)
- Aduni: Graph Algorithms III: Shortest Path - Lecture 8 (video)
- Aduni: Graph Alg. IV: Intro to geometric algorithms - Lecture 9 (video)
CS 61B 2014 (starting at 58:09) (video)- CS 61B 2014: Weighted graphs (video)
- Greedy Algorithms: Minimum Spanning Tree (video)
- Strongly Connected Components Kosaraju's Algorithm Graph Algorithm (video)
-
Full Coursera Course:
-
I'll implement:
- DFS with adjacency list (recursive)
- DFS with adjacency list (iterative with stack)
- DFS with adjacency matrix (recursive)
- DFS with adjacency matrix (iterative with stack)
- BFS with adjacency list
- BFS with adjacency matrix
- single-source shortest path (Dijkstra)
- minimum spanning tree
- DFS-based algorithms (see Aduni videos above):
- check for cycle (needed for topological sort, since we'll check for cycle before starting)
- topological sort
- count connected components in a graph
- list strongly connected components
- check for bipartite graph
Even More Knowledge
Even More Knowledge
-
Recursion
- Stanford lectures on recursion & backtracking:
- when it is appropriate to use it
- how is tail recursion better than not?
-
Dynamic Programming
- You probably won't see any dynamic programming problems in your interview, but it's worth being able to recognize a problem as being a candidate for dynamic programming.
- This subject can be pretty difficult, as each DP soluble problem must be defined as a recursion relation, and coming up with it can be tricky.
- I suggest looking at many examples of DP problems until you have a solid understanding of the pattern involved.
- Videos:
- the Skiena videos can be hard to follow since he sometimes uses the whiteboard, which is too small to see
- Skiena: CSE373 2012 - Lecture 19 - Introduction to Dynamic Programming (video)
- Skiena: CSE373 2012 - Lecture 20 - Edit Distance (video)
- Skiena: CSE373 2012 - Lecture 21 - Dynamic Programming Examples (video)
- Skiena: CSE373 2012 - Lecture 22 - Applications of Dynamic Programming (video)
- Simonson: Dynamic Programming 0 (starts at 59:18) (video)
- Simonson: Dynamic Programming I - Lecture 11 (video)
- Simonson: Dynamic programming II - Lecture 12 (video)
- List of individual DP problems (each is short): Dynamic Programming (video)
- Yale Lecture notes:
- Coursera:
-
Object-Oriented Programming
- Optional: UML 2.0 Series (video)
- Object-Oriented Software Engineering: Software Dev Using UML and Java (21 videos):
- Can skip this if you have a great grasp of OO and OO design practices.
- OOSE: Software Dev Using UML and Java (video)
- SOLID OOP Prinzipien:
- Bob Martin SOLID Principles of Object Oriented and Agile Design (video)
- SOLID Principles (video)
- S - Single Responsibility Principle | Single responsibility to each Object
- O - Open/Closed Principal | On production level Objects are ready for extension but not for modification
- L - Liskov Substitution Principal | Base Class and Derived class follow ‘IS A’ principal
- I - Interface segregation principle | clients should not be forced to implement interfaces they don't use
- D -Dependency Inversion principle | Reduce the dependency In composition of objects.
-
Design patterns
- Quick UML review (video)
- Learn these patterns:
- strategy
- singleton
- adapter
- prototype
- decorator
- visitor
- factory, abstract factory
- facade
- observer
- proxy
- delegate
- command
- state
- memento
- iterator
- composite
- flyweight
- Chapter 6 (Part 1) - Patterns (video)
- Chapter 6 (Part 2) - Abstraction-Occurrence, General Hierarchy, Player-Role, Singleton, Observer, Delegation (video)
- Chapter 6 (Part 3) - Adapter, Facade, Immutable, Read-Only Interface, Proxy (video)
- Series of videos (27 videos)
- Head First Design Patterns
- I know the canonical book is "Design Patterns: Elements of Reusable Object-Oriented Software", but Head First is great for beginners to OO.
- Handy reference: 101 Design Patterns & Tips for Developers
- Design patterns for humans
-
Combinatorics (n choose k) & Probability
- Math Skills: How to find Factorial, Permutation and Combination (Choose) (video)
- Make School: Probability (video)
- Make School: More Probability and Markov Chains (video)
- Khan Academy:
- Course layout:
- Just the videos - 41 (each are simple and each are short):
-
NP, NP-Complete and Approximation Algorithms
- Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem, and be able to recognize them when an interviewer asks you them in disguise.
- Know what NP-complete means.
- Computational Complexity (video)
- Simonson:
- Skiena:
- Complexity: P, NP, NP-completeness, Reductions (video)
- Complexity: Approximation Algorithms (video)
- Complexity: Fixed-Parameter Algorithms (video)
- Peter Norvig discusses near-optimal solutions to traveling salesman problem:
- Pages 1048 - 1140 in CLRS if you have it.
-
Caches
-
Processes and Threads
- Computer Science 162 - Operating Systems (25 videos):
- for processes and threads see videos 1-11
- Operating Systems and System Programming (video)
- What Is The Difference Between A Process And A Thread?
- Covers:
- Processes, Threads, Concurrency issues
- difference between processes and threads
- processes
- threads
- locks
- mutexes
- semaphores
- monitors
- how they work
- deadlock
- livelock
- CPU activity, interrupts, context switching
- Modern concurrency constructs with multicore processors
- Paging, segmentation and virtual memory (video)
- Interrupts (video)
- Scheduling (video)
- Process resource needs (memory: code, static storage, stack, heap, and also file descriptors, i/o)
- Thread resource needs (shares above (minus stack) with other threads in the same process but each has its own pc, stack counter, registers, and stack)
- Forking is really copy on write (read-only) until the new process writes to memory, then it does a full copy.
- Context switching
- How context switching is initiated by the operating system and underlying hardware
- Processes, Threads, Concurrency issues
- threads in C++ (series - 10 videos)
- concurrency in Python (videos):
- Computer Science 162 - Operating Systems (25 videos):
-
Testing
- To cover:
- how unit testing works
- what are mock objects
- what is integration testing
- what is dependency injection
- Agile Software Testing with James Bach (video)
- Open Lecture by James Bach on Software Testing (video)
- Steve Freeman - Test-Driven Development (that’s not what we meant) (video)
- TDD is dead. Long live testing.
- Is TDD dead? (video)
- Video series (152 videos) - not all are needed (video)
- Test-Driven Web Development with Python
- Dependency injection:
- How to write tests
- To cover:
-
Scheduling
- in an OS, how it works
- can be gleaned from Operating System videos
-
String searching & manipulations
- Sedgewick - Suffix Arrays (video)
- Sedgewick - Substring Search (videos)
- Search pattern in text (video)
If you need more detail on this subject, see "String Matching" section in Additional Detail on Some Subjects
-
Tries
- Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits to track the path.
- I read through code, but will not implement.
- Sedgewick - Tries (3 videos)
- Notes on Data Structures and Programming Techniques
- Short course videos:
- The Trie: A Neglected Data Structure
- TopCoder - Using Tries
- Stanford Lecture (real world use case) (video)
- MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through) (video)
-
Floating Point Numbers
-
Unicode
-
Endianness
- Big And Little Endian
- Big Endian Vs Little Endian (video)
- Big And Little Endian Inside/Out (video)
- Very technical talk for kernel devs. Don't worry if most is over your head.
- The first half is enough.
-
Networking
- if you have networking experience or want to be a reliability engineer or operations engineer, expect questions
- otherwise, this is just good to know
- Khan Academy
- UDP and TCP: Comparison of Transport Protocols (video)
- TCP/IP and the OSI Model Explained! (video)
- Packet Transmission across the Internet. Networking & TCP/IP tutorial. (video)
- HTTP (video)
- SSL and HTTPS (video)
- SSL/TLS (video)
- HTTP 2.0 (video)
- Video Series (21 videos) (video)
- Subnetting Demystified - Part 5 CIDR Notation (video)
- Sockets:
System Design, Scalability, Data Handling
System Design, Scalability, Data Handling
You can expect system design questions if you have 4+ years of experience.
- Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this.
- Considerations:
- scalability
- Distill large data sets to single values
- Transform one data set to another
- Handling obscenely large amounts of data
- system design
- features sets
- interfaces
- class hierarchies
- designing a system under certain constraints
- simplicity and robustness
- tradeoffs
- performance analysis and optimization
- scalability
- HIER ANFANGEN: The System Design Primer
- System Design from HiredInTech
- How Do I Prepare To Answer Design Questions In A Technical Inverview?
- 8 Things You Need to Know Before a System Design Interview
- Algorithm design
- Database Normalization - 1NF, 2NF, 3NF and 4NF (video)
- System Design Interview - There are a lot of resources in this one. Look through the articles and examples. I put some of them below.
- How to ace a systems design interview
- Numbers Everyone Should Know
- How long does it take to make a context switch?
- Transactions Across Datacenters (video)
- A plain English introduction to CAP Theorem
- Consensus Algorithms:
- Consistent Hashing
- NoSQL Patterns
- Skalierbarkeit:
- You don't need all of these. Just pick a few that interest you.
- Great overview (video)
- Kurzreihen:
- Scalable Web Architecture and Distributed Systems
- Fallacies of Distributed Computing Explained
- Pragmatic Programming Techniques
- Jeff Dean - Building Software Systems At Google and Lessons Learned (video)
- Introduction to Architecting Systems for Scale
- Scaling mobile games to a global audience using App Engine and Cloud Datastore (video)
- How Google Does Planet-Scale Engineering for Planet-Scale Infra (video)
- The Importance of Algorithms
- Sharding
- Scale at Facebook (2012), "Building for a Billion Users" (video)
- Engineering for the Long Game - Astrid Atkinson Keynote(video)
- 7 Years Of YouTube Scalability Lessons In 30 Minutes
- How PayPal Scaled To Billions Of Transactions Daily Using Just 8VMs
- How to Remove Duplicates in Large Datasets
- A look inside Etsy's scale and engineering culture with Jon Cowie (video)
- What Led Amazon to its Own Microservices Architecture
- To Compress Or Not To Compress, That Was Uber's Question
- Asyncio Tarantool Queue, Get In The Queue
- When Should Approximate Query Processing Be Used?
- Google's Transition From Single Datacenter, To Failover, To A Native Multihomed Architecture
- Spanner
- Machine Learning Driven Programming: A New Programming For A New World
- The Image Optimization Technology That Serves Millions Of Requests Per Day
- A Patreon Architecture Short
- Tinder: How Does One Of The Largest Recommendation Engines Decide Who You'll See Next?
- Design Of A Modern Cache
- Live Video Streaming At Facebook Scale
- A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS
- How Does The Use Of Docker Effect Latency?
- A 360 Degree View Of The Entire Netflix Stack
- Latency Is Everywhere And It Costs You Sales - How To Crush It
- Serverless (very long, just need the gist)
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies
- Cinchcast Architecture - Producing 1,500 Hours Of Audio Every Day
- Justin.Tv's Live Video Broadcasting Architecture
- Playfish's Social Gaming Architecture - 50 Million Monthly Users And Growing
- TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
- PlentyOfFish Architecture
- Salesforce Architecture - How They Handle 1.3 Billion Transactions A Day
- ESPN's Architecture At Scale - Operating At 100,000 Duh Nuh Nuhs Per Second
- See "Messaging, Serialization, and Queueing Systems" way below for info on some of the technologies that can glue services together
- Twitter:
- For even more, see "Mining Massive Datasets" video series in the Video Series section.
- Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world:
- Review: The System Design Primer
- System Design from HiredInTech
- cheat sheet
- flow:
- Understand the problem and scope:
- define the use cases, with interviewer's help
- suggest additional features
- remove items that interviewer deems out of scope
- assume high availability is required, add as a use case
- Think about constraints:
- ask how many requests per month
- ask how many requests per second (they may volunteer it or make you do the math)
- estimate reads vs. writes percentage
- keep 80/20 rule in mind when estimating
- how much data written per second
- total storage required over 5 years
- how much data read per second
- Abstract design:
- layers (service, data, caching)
- infrastructure: load balancing, messaging
- rough overview of any key algorithm that drives the service
- consider bottlenecks and determine solutions
- Understand the problem and scope:
- Exercises:
Final Review
Final Review
This section will have shorter videos that you can watch pretty quickly to review most of the important concepts.
It's nice if you want a refresher often.
- Series of 2-3 minutes short subject videos (23 videos)
- Series of 2-5 minutes short subject videos - Michael Sambol (18 videos):
- Sedgewick Videos - Algorithms I
- Sedgewick Videos - Algorithms II
Coding Question Practice
Coding Question Practice
Now that you know all the computer science topics above, it's time to practice answering coding problems.
Coding question practice is not about memorizing answers to programming problems.
Why you need to practice doing programming problems:
- problem recognition, and where the right data structures and algorithms fit in
- gathering requirements for the problem
- talking your way through the problem like you will in the interview
- coding on a whiteboard or paper, not a computer
- coming up with time and space complexity for your solutions
- testing your solutions
There is a great intro for methodical, communicative problem solving in an interview. You'll get this from the programming interview books, too, but I found this outstanding: Algorithm design canvas
No whiteboard at home? That makes sense. I'm a weirdo and have a big whiteboard. Instead of a whiteboard, pick up a large drawing pad from an art store. You can sit on the couch and practice. This is my "sofa whiteboard". I added the pen in the photo for scale. If you use a pen, you'll wish you could erase. Gets messy quick.
Supplemental:
- Mathematics for Topcoders
- Dynamic Programming – From Novice to Advanced
- MIT Interview Materials
- Exercises for getting better at a given language
Read and Do Programming Problems (in this order):
- Programming Interviews Exposed: Secrets to Landing Your Next Job, 2nd Edition
- answers in C, C++ and Java
- Cracking the Coding Interview, 6th Edition
- answers in Java
See Book List above
Coding exercises/challenges
Coding Question Practice
Once you've learned your brains out, put those brains to work. Take coding challenges every day, as many as you can.
Coding Interview Question Videos:
Challenge sites:
- LeetCode
- TopCoder
- Project Euler (math-focused)
- Codewars
- HackerEarth
- HackerRank
- Codility
- InterviewCake
- Geeks for Geeks
- InterviewBit
- Sphere Online Judge (spoj)
- Codechef
Challenge repos:
Mock Interviews:
- Gainlo.co: Mock interviewers from big companies - I used this and it helped me relax for the phone screen and on-site interview.
- Pramp: Mock interviews from/with peers - peer-to-peer model of practice interviews
- Refdash: Mock interviews and expedited interviews - also help candidates fast track by skipping multiple interviews with tech companies.
Once you're closer to the interview
- Cracking The Coding Interview Set 2 (videos):
Your Resume
- See Resume prep items in Cracking The Coding Interview and back of Programming Interviews Exposed
Be thinking of for when the interview comes
Think of about 20 interview questions you'll get, along with the lines of the items below. Have 2-3 answers for each. Have a story, not just data, about something you accomplished.
- Why do you want this job?
- What's a tough problem you've solved?
- Biggest challenges faced?
- Best/worst designs seen?
- Ideas for improving an existing product.
- How do you work best, as an individual and as part of a team?
- Which of your skills or experiences would be assets in the role and why?
- What did you most enjoy at [job x / project y]?
- What was the biggest challenge you faced at [job x / project y]?
- What was the hardest bug you faced at [job x / project y]?
- What did you learn at [job x / project y]?
- What would you have done better at [job x / project y]?
Have questions for the interviewer
Some of mine (I already may know answer to but want their opinion or team perspective):
- How large is your team?
- What does your dev cycle look like? Do you do waterfall/sprints/agile?
- Are rushes to deadlines common? Or is there flexibility?
- How are decisions made in your team?
- How many meetings do you have per week?
- Do you feel your work environment helps you concentrate?
- What are you working on?
- What do you like about it?
- What is the work life like?
Once You've Got The Job
Congratulations!
Keep learning.
You're never really done.
*****************************************************************************************************
*****************************************************************************************************
Everything below this point is optional.
By studying these, you'll get greater exposure to more CS concepts, and will be better prepared for
any software engineering job. You'll be a much more well-rounded software engineer.
*****************************************************************************************************
*****************************************************************************************************
Additional Books
Additional Books
-
The Unix Programming Environment
- an oldie but a goodie
-
The Linux Command Line: A Complete Introduction
- a modern option
-
- a gentle introduction to design patterns
-
Design Patterns: Elements of Reusable Object-Oriented Software
- aka the "Gang Of Four" book, or GOF
- the canonical design patterns book
-
Algorithm Design Manual (Skiena)
- As a review and problem recognition
- The algorithm catalog portion is well beyond the scope of difficulty you'll get in an interview.
- This book has 2 parts:
- class textbook on data structures and algorithms
- pros:
- is a good review as any algorithms textbook would be
- nice stories from his experiences solving problems in industry and academia
- code examples in C
- cons:
- can be as dense or impenetrable as CLRS, and in some cases, CLRS may be a better alternative for some subjects
- chapters 7, 8, 9 can be painful to try to follow, as some items are not explained well or require more brain than I have
- don't get me wrong: I like Skiena, his teaching style, and mannerisms, but I may not be Stony Brook material.
- pros:
- algorithm catalog:
- this is the real reason you buy this book.
- about to get to this part. Will update here once I've made my way through it.
- class textbook on data structures and algorithms
- Can rent it on kindle
- Answers:
- Errata
-
- Important: Reading this book will only have limited value. This book is a great review of algorithms and data structures, but won't teach you how to write good code. You have to be able to code a decent solution efficiently.
- aka CLR, sometimes CLRS, because Stein was late to the game
-
Computer Architecture, Sixth Edition: A Quantitative Approach
- For a richer, more up-to-date (2017), but longer treatment
-
- The first couple of chapters present clever solutions to programming problems (some very old using data tape) but that is just an intro. This a guidebook on program design and architecture, much like Code Complete, but much shorter.
Additional Learning
Additional Learning
These topics will likely not come up in an interview, but I added them to help you become a well-rounded software engineer, and to be aware of certain technologies and algorithms, so you'll have a bigger toolbox.
-
Compilers
-
Emacs and vi(m)
- Familiarize yourself with a unix-based code editor
- vi(m):
- emacs:
-
Unix command line tools
-
Information theory (videos)
- Khan Academy
- more about Markov processes:
- See more in MIT 6.050J Information and Entropy series below.
-
Parität und Hamming Code (videos)
- Intro
- Parity
- Hamming Code:
- Error Checking
-
Entropy
- also see videos below
- make sure to watch information theory videos first
- Information Theory, Claude Shannon, Entropy, Redundancy, Data Compression & Bits (video)
-
Cryptography
- also see videos below
- make sure to watch information theory videos first
- Khan Academy Series
- Cryptography: Hash Functions
- Cryptography: Encryption
-
Compression
- make sure to watch information theory videos first
- Computerphile (videos):
- Compressor Head videos
- (optional) Google Developers Live: GZIP is not enough!
-
Computer Security
-
Garbage collection
-
Parallel Programming
-
Messaging, Serialization, and Queueing Systems
-
A*
-
Fast Fourier Transform
-
Bloom Filter
- Given a Bloom filter with m bits and k hashing functions, both insertion and membership testing are O(k)
- Bloom Filters (video)
- Bloom Filters | Mining of Massive Datasets | Stanford University (video)
- Tutorial
- How To Write A Bloom Filter App
-
HyperLogLog
-
Locality-Sensitive Hashing
- used to determine the similarity of documents
- the opposite of MD5 or SHA which are used to determine if 2 documents/strings are exactly the same.
- Simhashing (hopefully) made simple
-
van Emde Boas Trees
-
Augmented Data Structures
-
Balanced search trees
-
Know at least one type of balanced binary tree (and know how it's implemented):
-
"Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular. A particularly interesting self-organizing data structure is the splay tree, which uses rotations to move any accessed key to the root." - Skiena
-
Of these, I chose to implement a splay tree. From what I've read, you won't implement a balanced search tree in your interview. But I wanted exposure to coding one up and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code.
- splay tree: insert, search, delete functions If you end up implementing red/black tree try just these:
- search and insertion functions, skipping delete
-
I want to learn more about B-Tree since it's used so widely with very large data sets.
-
AVL trees
- In practice: From what I can tell, these aren't used much in practice, but I could see where they would be: The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it attractive for data structures that may be built once and loaded without reconstruction, such as language dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter).
- MIT AVL Trees / AVL Sort (video)
- AVL Trees (video)
- AVL Tree Implementation (video)
- Split And Merge
-
Splay trees
- In practice: Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory, networking and file system code) etc.
- CS 61B: Splay Trees (video)
- MIT Lecture: Splay Trees:
- Gets very mathy, but watch the last 10 minutes for sure.
- Video
-
Red/black trees
- these are a translation of a 2-3 tree (see below)
- In practice: Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in computational geometry can be based on red–black trees, and the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java, the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor hashcodes, a Red-Black tree is used.
- Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)
- Aduni - Algorithms - Lecture 5 (video)
- Red-Black Tree
- An Introduction To Binary Search And Red Black Tree
-
2-3 search trees
- In practice: 2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees).
- You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees.
- 23-Tree Intuition and Definition (video)
- Binary View of 23-Tree
- 2-3 Trees (student recitation) (video)
-
2-3-4 Trees (aka 2-4 trees)
- In practice: For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2-4 trees just before red–black trees, even though 2-4 trees are not often used in practice.
- CS 61B Lecture 26: Balanced Search Trees (video)
- Bottom Up 234-Trees (video)
- Top Down 234-Trees (video)
-
N-ary (K-ary, M-ary) trees
- note: the N or K is the branching factor (max branches)
- binary trees are a 2-ary tree, with branching factor = 2
- 2-3 trees are 3-ary
- K-Ary Tree
-
B-Trees
- fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor)
- In Practice: B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary block in a particular file. The basic problem is turning the file block i address into a disk block (or perhaps to a cylinder-head-sector) address.
- B-Tree
- Introduction to B-Trees (video)
- B-Tree Definition and Insertion (video)
- B-Tree Deletion (video)
- MIT 6.851 - Memory Hierarchy Models (video) - covers cache-oblivious B-Trees, very interesting data structures - the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
-
-
k-D Trees
- great for finding number of points in a rectangle or higher dimension object
- a good fit for k-nearest neighbors
- Kd Trees (video)
- kNN K-d tree algorithm (video)
-
Skip lists
- "These are somewhat of a cult data structure" - Skiena
- Randomization: Skip Lists (video)
- For animations and a little more detail
-
Network Flows
-
Disjoint Sets & Union Find
-
Math for Fast Processing
-
Treap
- Combination of a binary search tree and a heap
- Treap
- Data Structures: Treaps explained (video)
- Applications in set operations
-
Linear Programming (videos)
-
Geometry, Convex hull (videos)
-
Discrete math
- see videos below
-
Machine Learning
- Why ML?
- Google's Cloud Machine learning tools (video)
- Google Developers' Machine Learning Recipes (Scikit Learn & Tensorflow) (video)
- Tensorflow (video)
- Tensorflow Tutorials
- Practical Guide to implementing Neural Networks in Python (using Theano)
- Courses:
- Great starter course: Machine Learning - videos only - see videos 12-18 for a review of linear algebra (14 and 15 are duplicates)
- Neural Networks for Machine Learning
- Google's Deep Learning Nanodegree
- Google/Kaggle Machine Learning Engineer Nanodegree
- Self-Driving Car Engineer Nanodegree
- Metis Online Course ($99 for 2 months)
- Resources:
Additional Detail on Some Subjects
Additional Detail on Some Subjects
I added these to reinforce some ideas already presented above, but didn't want to include them
above because it's just too much. It's easy to overdo it on a subject.
You want to get hired in this century, right?
-
Union-Find
-
More Dynamic Programming (videos)
- 6.006: Dynamic Programming I: Fibonacci, Shortest Paths
- 6.006: Dynamic Programming II: Text Justification, Blackjack
- 6.006: DP III: Parenthesization, Edit Distance, Knapsack
- 6.006: DP IV: Guitar Fingering, Tetris, Super Mario Bros.
- 6.046: Dynamic Programming & Advanced DP
- 6.046: Dynamic Programming: All-Pairs Shortest Paths
- 6.046: Dynamic Programming (student recitation)
-
Advanced Graph Processing (videos)
-
MIT Probability (mathy, and go slowly, which is good for mathy things) (videos):
-
String Matching
- Rabin-Karp (videos):
- Knuth-Morris-Pratt (KMP):
- Boyer–Moore string search algorithm
- Coursera: Algorithms on Strings
- starts off great, but by the time it gets past KMP it gets more complicated than it needs to be
- nice explanation of tries
- can be skipped
-
Sortierung
- Stanford lectures on sorting:
- Shai Simonson, Aduni.org:
- Steven Skiena lectures on sorting:
Videoreihen
Lehn dich zurück und genieße. "Netflix and skill" :P
Videoreihen
-
List of individual Dynamic Programming problems (each is short)
-
Excellent - MIT Calculus Revisited: Single Variable Calculus
-
Computer Science 70, 001 - Spring 2015 - Discrete Mathematics and Probability Theory
-
CSE373 - Analysis of Algorithms (25 videos)
-
UC Berkeley CS 152: Computer Architecture and Engineering (20 videos) -
Carnegie Mellon - Computer Architecture Lectures (39 videos)
-
MIT 6.042J: Mathematics for Computer Science, Fall 2010 (25 videos)
-
MIT 6.050J: Information and Entropy, Spring 2008 (19 videos)
Computer Science Courses
Paper (Wissenschaftliche Artikel)
Paper (Wissenschaftliche Artikel)
- Liebst du klassische Paper?
- 1978: Communicating Sequential Processes
- 2003: The Google File System
- ersetzt durch Colossus in 2012
- 2004: MapReduce: Simplified Data Processing on Large Clusters
- größtenteils ersetzt durch Cloud Dataflow?
- 2006: Bigtable: A Distributed Storage System for Structured Data
- 2006: The Chubby Lock Service for Loosely-Coupled Distributed Systems
- 2007: Dynamo: Amazon’s Highly Available Key-value Store
- Das Dynamo Paper hat die NoSQL Revolution ausgelöst.
- 2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)
- 2010: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- 2010: Dremel: Interactive Analysis of Web-Scale Datasets
- 2012: Google's Colossus
- Paper nicht verfügbar
- 2012: AddressSanitizer: A Fast Address Sanity Checker:
- 2013: Spanner: Google’s Globally-Distributed Database:
- 2014: Machine Learning: The High-Interest Credit Card of Technical Debt
- 2015: Continuous Pipelines at Google
- 2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads
- 2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- 2015: How Developers Search for Code: A Case Study
- 2016: Borg, Omega, and Kubernetes