init git and python-web courses

This commit is contained in:
Sanket Patel 2020-11-03 17:15:17 +05:30
parent 3a215d2526
commit 5135eff925
9 changed files with 1000 additions and 0 deletions

183
courses/git/branches.md Normal file
View File

@ -0,0 +1,183 @@
# Working With Branches
Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there's a better way. Use _branches._ Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged.
Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with contents snapshot at the checked out version.
Let's create a branch and see how it looks like:
```bash
spatel1-mn1:school-of-sre spatel1$ git branch b1
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 7f3b00e (HEAD -> master, b1) adding file 2
* df2fb7a adding file 1
```
We create a branch called `b1`. Git log tells us that b1 also points to the last commit (7f3b00e) but the `HEAD` is still pointing to master. If you remember, HEAD points to the commit/reference wherever you are checkout to. So if we checkout to `b1`, HEAD should point to that. Let's confirm:
```bash
spatel1-mn1:school-of-sre spatel1$ git checkout b1
Switched to branch 'b1'
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 7f3b00e (HEAD -> b1, master) adding file 2
* df2fb7a adding file 1
```
`b1` still points to the same commit but HEAD now points to `b1`. Since we create a branch at commit `7f3b00e`, there will be two lines of histories starting this commit. Depending on which branch you are checked out on, the line of history will progress.
At this moment, we are checked out on branch `b1`, so making a new commit will advance branch reference `b1` to that commit and current `b1` commit will become its parent. Let's do that.
```bash
# Creating a file and making a commit
spatel1-mn1:school-of-sre spatel1$ echo "I am a file in b1 branch" > b1.txt
spatel1-mn1:school-of-sre spatel1$ git add b1.txt
spatel1-mn1:school-of-sre spatel1$ git commit -m "adding b1 file"
[b1 872a38f] adding b1 file
1 file changed, 1 insertion(+)
create mode 100644 b1.txt
# The new line of history
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 872a38f (HEAD -> b1) adding b1 file
* 7f3b00e (master) adding file 2
* df2fb7a adding file 1
spatel1-mn1:school-of-sre spatel1$
```
Do note that master is still pointing to the old commit it was pointing to. We can now checkout to master branch and make commits there. This will result in another line of history starting from commit 7f3b00e.
```bash
# checkout to master branch
spatel1-mn1:school-of-sre spatel1$ git checkout master
Switched to branch 'master'
# Creating a new commit on master branch
spatel1-mn1:school-of-sre spatel1$ echo "new file in master branch" > master.txt
spatel1-mn1:school-of-sre spatel1$ git add master.txt
spatel1-mn1:school-of-sre spatel1$ git commit -m "adding master.txt file"
[master 60dc441] adding master.txt file
1 file changed, 1 insertion(+)
create mode 100644 master.txt
# The history line
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 60dc441 (HEAD -> master) adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
Notice how branch b1 is not visible here since we are checkout on master. Let's try to visualize both to get the whole picture:
```bash
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 60dc441 (HEAD -> master) adding master.txt file
| * 872a38f (b1) adding b1 file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
Above tree structure should make things clear. Notice a clear branch/fork on commit 7f3b00e. This is how we create branches. Now they both are two separate lines of history on which feature development can be done independently.
**To reiterate, internally, git is just a tree of commits. Branch names (human readable) are pointers to those commits in the tree. We use various git commands to work with the tree structure and references. Git accordingly modifies contents of our repo.**
## Merges
Now say the feature you were working on branch `b1` is complete. And you need to merge it on master branch, where all the final version of code goes. So first you will checkout to branch master and then you will pull the latest code from upstream (eg: GitHub). Then you need to merge your code from `b1` into master. And there could be two ways this can be done.
Here is the current history:
```bash
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 60dc441 (HEAD -> master) adding master.txt file
| * 872a38f (b1) adding b1 file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
**Option 1: Directly merge the branch.** Merging the branch b1 into master will result in a new merge commit which will merge changes from two different lines of history and create a new commit of the result.
```bash
spatel1-mn1:school-of-sre spatel1$ git merge b1
Merge made by the 'recursive' strategy.
b1.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 b1.txt
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 8fc28f9 (HEAD -> master) Merge branch 'b1'
|\
| * 872a38f (b1) adding b1 file
* | 60dc441 adding master.txt file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
You can see a new merge commit created (8fc28f9). You will be prompted for the commit message. If there are a lot of branches in the repo, this result will end-up with a lot of merge commits. Which looks ugly compared to a single line of history of development. So let's look at an alternative approach
First let's [reset](https://git-scm.com/docs/git-reset) our last merge and go to the previous state.
```bash
spatel1-mn1:school-of-sre spatel1$ git reset --hard 60dc441
HEAD is now at 60dc441 adding master.txt file
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 60dc441 (HEAD -> master) adding master.txt file
| * 872a38f (b1) adding b1 file
|/
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
**Option 2: Rebase.** Now, instead of merging two branches which has a similar base (commit: 7f3b00e), let us rebase branch b1 on to current master. **What this means is take branch `b1` (from commit 7f3b00e to commit 872a38f) and rebase (put them on top of) master (60dc441).**
```bash
# Switch to b1
spatel1-mn1:school-of-sre spatel1$ git checkout b1
Switched to branch 'b1'
# Rebase (b1 which is current branch) on master
spatel1-mn1:school-of-sre spatel1$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: adding b1 file
# The result
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 5372c8f (HEAD -> b1) adding b1 file
* 60dc441 (master) adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
You can see `b1` which had 1 commit. That commit's parent was `7f3b00e`. But since we rebase it on master (`60dc441`). That becomes the parent now. As a side effect, you also see it has become a single line of history. Now if we were to merge `b1` into `master`, it would simply mean change `master` to point to `5372c8f` which is `b1`. Let's try it:
```bash
# checkout to master since we want to merge code into master
spatel1-mn1:school-of-sre spatel1$ git checkout master
Switched to branch 'master'
# the current history, where b1 is based on master
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 5372c8f (b1) adding b1 file
* 60dc441 (HEAD -> master) adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1
# Performing the merge, notice the "fast-forward" message
spatel1-mn1:school-of-sre spatel1$ git merge b1
Updating 60dc441..5372c8f
Fast-forward
b1.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 b1.txt
# The Result
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all
* 5372c8f (HEAD -> master, b1) adding b1 file
* 60dc441 adding master.txt file
* 7f3b00e adding file 2
* df2fb7a adding file 1
```
Now you see both `b1` and `master` are pointing to the same commit. Your code has been merged to the master branch and it can be pushed. Also we have clean line of history! :D

258
courses/git/git-basics.md Normal file
View File

@ -0,0 +1,258 @@
# School Of SRE: Git
## Pre - Reads
1. Have Git installed [https://git-scm.com/downloads](https://git-scm.com/downloads)
2. Have taken any git high level tutorial or following LinkedIn learning courses
- [https://www.linkedin.com/learning/git-essential-training-the-basics/](https://www.linkedin.com/learning/git-essential-training-the-basics/)
- [https://www.linkedin.com/learning/git-branches-merges-and-remotes/](https://www.linkedin.com/learning/git-branches-merges-and-remotes/)
- [The Official Git Docs](https://git-scm.com/doc)
## What to expect from this training
As an engineer in the field of computer science, having knowledge of version control tools becomes almost a requirement. While there are a lot of version control tools that exist today, Git perhaps is the most used one and this course we will be working with Git. While this course does not start with Git 101 and expects basic knowledge of git as a prerequisite, it will reintroduce the git concepts known by you with details covering what is happening under the hood as you execute various git commands. So that next time you run a git command, you will be able to press enter more confidently!
## What is not covered under this training
Advanced usage and specifics of internal implementation details of Git.
## Training Content
### Table of Contents
1. Git Basics
2. Working with Branches
3. Git with Github
4. Hooks
## Git Basics
Though you might be aware already, let's revisit why we need a version control system. As the project grows and multiple developers start working on it, an efficient method for collaboration is warranted. Git helps the team collaborate easily and also maintains history of the changes happened with the codebase.
### Creating a Git Repo
Any folder can be converted into a git repository. After executing the following command, we will see a `.git` folder within the folder, which makes our folder a git repository. **All the magic that git does, `.git` folder is the enabler for the same.**
```bash
# creating an empty folder and changing current dir to it
spatel1-mn1:~ spatel1$ cd /tmp
spatel1-mn1:tmp spatel1$ mkdir school-of-sre
spatel1-mn1:tmp spatel1$ cd school-of-sre/
# initialize a git repo
spatel1-mn1:school-of-sre spatel1$ git init
Initialized empty Git repository in /private/tmp/school-of-sre/.git/
```
As the output says, an empty git repo has been initialized in our folder. Let's take a look at what is there.
```bash
spatel1-mn1:school-of-sre spatel1$ ls .git/
HEAD config description hooks info objects refs
```
There are a bunch of folders and files in the `.git` folder. As I said, all these enables git to do its magic. We will look into some of these folders and files. But for now, what we have is an empty git repository.
### Tracking a File
Now as you might already know, let us create a new file in our repo (we will refer to the folder as _repo_ now.) And see git status
```bash
spatel1-mn1:school-of-sre spatel1$ echo "I am file 1" > file1.txt
spatel1-mn1:school-of-sre spatel1$ git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
file1.txt
nothing added to commit but untracked files present (use "git add" to track)
```
The current git status says `No commits yet` and there is one untracked file. Since we just created the file, git is not tracking that file. We explicitly need to ask git to track files and folders. (also checkout [gitignore](https://git-scm.com/docs/gitignore)) And how we do that is via `git add` command as suggested in the above output. Then we go ahead and create a commit.
```bash
spatel1-mn1:school-of-sre spatel1$ git add file1.txt
spatel1-mn1:school-of-sre spatel1$ git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: file1.txt
spatel1-mn1:school-of-sre spatel1$ git commit -m "adding file 1"
[master (root-commit) df2fb7a] adding file 1
1 file changed, 1 insertion(+)
create mode 100644 file1.txt
```
Notice how after adding the file, git status says `Changes to be commited:`. What it means is whatever is listed there, will be included in the next commit. Then we go ahead and create a commit, with an attached messaged via `-m`.
### More About a Commit
Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. (`df2fb7a` for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the `.git` folder. This is where all this snapshot or versions are stored. _In an efficient manner._
### Adding More Changes
Let us create one more file and commit the change. It would look the same as the previous commit we made.
```bash
spatel1-mn1:school-of-sre spatel1$ echo "I am file 2" > file2.txt
spatel1-mn1:school-of-sre spatel1$ git add file2.txt
spatel1-mn1:school-of-sre spatel1$ git commit -m "adding file 2"
[master 7f3b00e] adding file 2
1 file changed, 1 insertion(+)
create mode 100644 file2.txt
```
A new commit with ID `7f3b00e` has been created. You can issue `git status` at any time to see the state of the repository.
**IMPORTANT: Note that commit IDs are long string (SHA) but we can refer to a commit by its initial few (8 or more) characters too. We will interchangeably using shorter and longer commit IDs.**
Now that we have two commits, let's visualize them:
```bash
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1
```
`git log`, as the name suggests, prints the log of all the git commits. Here you see two additional arguments, `--oneline` prints the shorter version of the log, ie: the commit message only and not the person who made the commit and when. `--graph` prints it in graph format.
**Now at this moment the commits might look like just one in each line but all commits are stored as a tree like data structure internally by git. That means there can be two or more children commits of a given commit. And not just a single line of commits. We will look more into this part when we get to the Branches section. For now this is our commit history:**
```bash
df2fb7a ===> 7f3b00e
```
### Are commits really linked?
As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at content of the contents of the second commit
```bash
spatel1-mn1:school-of-sre spatel1$ git cat-file -p 7f3b00e
tree ebf3af44d253e5328340026e45a9fa9ae3ea1982
parent df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a
author Sanket Patel <spatel1@linkedin.com> 1603273316 -0700
committer Sanket Patel <spatel1@linkedin.com> 1603273316 -0700
adding file 2
```
Take a note of `parent` attribute in the above output. It points to the commit id of the first commit we made. So this proves that they are linked! Additionally you can see the second commit's message in this object. As I said all this magic is enabled by `.git` folder and the object to which we are looking at also is in that folder.
```bash
spatel1-mn1:school-of-sre spatel1$ ls .git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9
.git/objects/7f/3b00eaa957815884198e2fdfec29361108d6a9
```
It is stored in `.git/objects/` folder. All the files and changes to them as well are stored in this folder.
### The Version Control part of Git
We already can see two commits (versions) in our git log. One thing a version control tool gives you is ability to browse back and forth in history. For example: some of your users are running an old version of code and they are reporting an issue. In order to debug the issue, you need access to the old code. The one in your current repo is the latest code. In this example, you are working on the second commit (7f3b00e) and someone reported an issue with the code snapshot at commit (df2fb7a). This is how you would get access to the code at any older commit
```bash
# Current contents, two files present
patel1-mn1:school-of-sre spatel1$ ls
file1.txt file2.txt
# checking out to (an older) commit
spatel1-mn1:school-of-sre spatel1$ git checkout df2fb7a
Note: checking out 'df2fb7a'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at df2fb7a adding file 1
# checking contents, can verify it has old contents
spatel1-mn1:school-of-sre spatel1$ ls
file1.txt
```
So this is how we would get access to old versions/snapshots. All we need is a _reference_ to that snapshot. Upon executing `git checkout ...`, what git does for you is use the `.git` folder, see what was the state of things (files and folders) at that version/reference and replace the contents of current directory with those contents. The then-existing content will no longer be present in the local dir (repo) but we can and will still get access to them because they are tracked via git commit and `.git` folder has them stored/tracked.
### Reference
I mention in the previous section that we need a _reference_ to the version. By default, git repo is made of tree of commits. And each commit has a unique IDs. But the unique ID is not the only thing we can reference commits via. There are multiple ways to reference commits. For example: `HEAD` is a reference to current commit. _Whatever commit your repo is checked out at, `HEAD` will point to that._ `HEAD~1` is reference to previous commit. So while checking out previous version in section above, we could have done `git checkout HEAD~1`.
Similarly, master is also a reference (to a branch). Since git uses tree like structure to store commits, there of course will be branches. And the default branch is called `master`. Master (or any branch reference) will point to the latest commit in the branch. Even though we have checked out to the previous commit in out repo, `master` still points to the latest commit. And we can get back to the latest version by checkout at `master` reference
```bash
spatel1-mn1:school-of-sre spatel1$ git checkout master
Previous HEAD position was df2fb7a adding file 1
Switched to branch 'master'
# now we will see latest code, with two files
spatel1-mn1:school-of-sre spatel1$ ls
file1.txt file2.txt
```
Note, instead of `master` in above command, we could have used commit's ID as well.
### References and The Magic
Let's look at the state of things. Two commits, `master` and `HEAD` references are pointing to the latest commit
```bash
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1
```
The magic? Let's examine these files:
```bash
spatel1-mn1:school-of-sre spatel1$ cat .git/refs/heads/master
7f3b00eaa957815884198e2fdfec29361108d6a9
```
Viola! Where master is pointing to is stored in a file. **Whenever git needs to know where master reference is pointing to, or if git needs to update where master points, it just needs to update the file above.** So when you create a new commit, a new commit is created on top of the current commit and the master file is updated with the new commit's ID.
Similary, for `HEAD` reference:
```bash
spatel1-mn1:school-of-sre spatel1$ cat .git/HEAD
ref: refs/heads/master
```
We can see `HEAD` is pointing to a reference called `refs/heads/master`. So `HEAD` will point where ever the `master` points.
### Little Adventure
We discussed how git will update the files as we execute commands. But let's try to do it ourselves, by hand, and see what happens.
```bash
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1
```
Now let's change master to point to the previous/first commit.
```bash
spatel1-mn1:school-of-sre spatel1$ echo df2fb7a61f5d40c1191e0fdeb0fc5d6e7969685a > .git/refs/heads/master
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* df2fb7a (HEAD -> master) adding file 1
# RESETTING TO ORIGINAL
spatel1-mn1:school-of-sre spatel1$ echo 7f3b00eaa957815884198e2fdfec29361108d6a9 > .git/refs/heads/master
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* 7f3b00e (HEAD -> master) adding file 2
* df2fb7a adding file 1
```
We just edited the `master` reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, isn't it?

View File

@ -0,0 +1,52 @@
## Git with Github
Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the internet where you can centrally host your git repos and collaborate with other developers.
Most of the workflow will remain the same as we discussed, with addition of couple of things:
1. Pull: to pull latest changes from github (the central) repo
2. Push: to push your changes to github repo so that it's available to all people
GitHub has written nice guides and tutorials about this and you can refer them here:
- [GitHub Hello World](https://guides.github.com/activities/hello-world/)
- [Git Handbook](https://guides.github.com/introduction/git-handbook/)
## Hooks
Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located:
```bash
spatel1-mn1:school-of-sre spatel1$ ls .git/hooks/
applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample
commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample
```
Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. Ie: if you want to run tests before pushing code, you would want to setup `pre-push` hooks. Let's try to create a pre commit hook.
```bash
spatel1-mn1:school-of-sre spatel1$ echo "echo this is from pre commit hook" > .git/hooks/pre-commit
spatel1-mn1:school-of-sre spatel1$ chmod +x .git/hooks/pre-commit
```
We basically create a file called `pre-commit` in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed.
```bash
spatel1-mn1:school-of-sre spatel1$ echo "sample file" > sample.txt
spatel1-mn1:school-of-sre spatel1$ git add sample.txt
spatel1-mn1:school-of-sre spatel1$ git commit -m "adding sample file"
this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION
[master 9894e05] adding sample file
1 file changed, 1 insertion(+)
create mode 100644 sample.txt
```
## What next from here?
There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like
- Cherrypick
- Squash
- Amend
- Stash
- Reset

114
courses/python_web/intro.md Normal file
View File

@ -0,0 +1,114 @@
# School of SRE: Python and The Web
## Pre - Reads
- Basic understanding of python language.
- Basic familiarity with flask framework.
## What to expect from this training
This course is divided into two high level parts. In the first part, assuming familiarity with python languages basic operations and syntax usage, we will dive a little deeper into understanding python as a language. We will compare python with other programming languages that you might already know like Java and C. We will also explore concepts of Python objects and with help of that, explore python features like decorators.
In the second part which will revolve around the web, and also assume familiarity with the Flask framework, we will start from the socket module and work with HTTP requests. This will demystify how frameworks like flask work internally.
And to introduce SRE flavour to the course, we will design, develop and deploy (in theory) a URL shortening application. We will emphasize parts of the whole process that are more important as an SRE of the said app/service.
## What is not covered under this training
Extensive knowledge of python internals and advanced python.
## Training Content
### Lab Environment Setup
Have latest version of python installed
### TOC
1. The Python Language
1. Some Python Concepts
2. Python Gotchas
2. Python and Web
1. Sockets
2. Flask
3. The URL Shortening App
1. Design
2. Scaling The App
3. Monitoring The App
## The Python Language
Assuming you know a little bit of C/C++ and Java, let's try to discuss the following questions in context of those two languages and python. You might have heard that C/C++ is a compiled language while python is an interpreted language. Generally, with compiled language we first compile the program and then run the executable while in case of python we run the source code directly like `python hello_world.py`. While Java, being an interpreted language, still has a separate compilation step and then its run. So what's really the difference?
### Compiled vs. Interpreted
This might sound a little weird to you: python, in a way is a compiled language! Python has a compiler built-in! It is obvious in the case of java since we compile it using a separate command ie: `javac helloWorld.java` and it will produce a `.class` file which we know as a _bytecode_. Well, python is very similar to that. One difference here is that there is no separate compile command/binary needed to run a python program.
**What is the difference then, between java and python?**
Well, Java's compiler is more strict and sophisticated. As you might know Java is a statically typed language. So the compiler is written in a way that it can verify types related errors during compile time. While python being a _dynamic_ language, types are not known until a program is run. So in a way, python compiler is dumb (or, less strict). But there indeed is a compile step involved when a python program is run. You might have seen python bytecode files with `.pyc` extension. Here is how you can see bytecode for a given python program.
```bash
# Create a Hello World
spatel1-mn1:tmp spatel1$ echo "print('hello world')" > hello_world.py
# Making sure it runs
spatel1-mn1:tmp spatel1$ python3 hello_world.py
hello world
# The bytecode of the given program
spatel1-mn1:tmp spatel1$ python -m dis hello_world.py
1 0 LOAD_NAME 0 (print)
2 LOAD_CONST 0 ('hello world')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
```
Read more about dis module [here](https://docs.python.org/3/library/dis.html)
Now coming to C/C++, there of course is a compiler. But the output is different than what java/python compiler would produce. Compiling a C program would produce what we also know as _machine code_. As opposed to bytecode.
### Running The Programs
We know compilation is involved in all 3 languages we are discussing. Just that the compilers are different in nature and they output different types of content. In case of C/C++, the output is machine code which can be directly read by your operating system. When you execute that program, your OS will know how exactly to run it. **But this is not the case with bytecode.**
Those bytecodes are language specific. Python has its own set of bytecode defined (more in `dis` module) and so does java. So naturally, your operating system will not know how to run it. To run this bytecode, we have something called Virtual Machines. Ie: The JVM or the Python VM (CPython, Jython). These so called Virtual Machines are the programs which can read the bytecode and run it on a given operating system. Python has multiple VMs available. Cpython is a python VM implemented in C language, similarly Jython is a Java implementation of python VM. **At the end of the day, what they should be capable of is to understand python language syntax, be able to compile it to bytecode and be able to run that bytecode.** You can implement a python VM in any language! (And people do so, just because it can be done)
```
The Operating System
+------------------------------------+
| |
| |
| |
hello_world.py Python bytecode | Python VM Process |
| |
+----------------+ +----------------+ | +----------------+ |
|print(... | COMPILE |LOAD_CONST... | | |Reads bytecode | |
| +--------------->+ +------------------->+line by line | |
| | | | | |and executes. | |
| | | | | | | |
+----------------+ +----------------+ | +----------------+ |
| |
| |
| |
hello_world.c OS Specific machinecode | A New Process |
| |
+----------------+ +----------------+ | +----------------+ |
|void main() { | COMPILE | binary contents| | | binary contents| |
| +--------------->+ +------------------->+ | |
| | | | | | | |
| | | | | | | |
+----------------+ +----------------+ | +----------------+ |
| (binary contents |
| runs as is) |
| |
| |
+------------------------------------+
```
Two things to note for above diagram:
1. Generally, when we run a python program, a python VM process is started which reads the python source code, compiles it to byte code and run it in a single step. Compiling is not a separate step. Shown only for illustration purpose.
2. Binaries generated for C like languages are not _exactly_ run as is. Since there are multiple types of binaries (eg: ELF), there are more complicated steps involved in order to run a binary but we will not go into that since all that is done at OS level.

View File

@ -0,0 +1,162 @@
# Some Python Concepts
Though you are expected to know python and its syntax at basic level, let us discuss some fundamental concepts that will help you understand the python language better.
**Everything in Python is an object.**
That includes the functions, lists, dicts, classes, modules, a running function (instance of function definition), everything. In the CPython, it would mean there is an underlying struct variable for each object.
In python's current execution context, all the variables are stored in a dict. It'd be a string to object mapping. If you have a function and a float variable defined in the current context, here is how it is handled internally.
```python
>>> float_number=42.0
>>> def foo_func():
... pass
...
# NOTICE HOW VARIABLE NAMES ARE STRINGS, stored in a dict
>>> locals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'float_number': 42.0, 'foo_func': <function foo_func at 0x1055847a0>}
```
## Python Functions
Since functions too are objects, we can see what all attributes a function contains as following
```python
>>> def hello(name):
... print(f"Hello, {name}!")
...
>>> dir(hello)
['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__',
'__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__',
'__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__',
'__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__']
```
While there are a lot of them, let's look at some interesting ones
#### __globals__
This attribute, as the name suggests, has references of global variables. If you ever need to know what all global variables are in the scope of this function, this will tell you. See how the function start seeing the new variable in globals
```python
>>> hello.__globals__
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'hello': <function hello at 0x7fe4e82554c0>}
# adding new global variable
>>> GLOBAL="g_val"
>>> hello.__globals__
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'hello': <function hello at 0x7fe4e82554c0>, 'GLOBAL': 'g_val'}
```
### __code__
This is an interesting one! As everything in python is an object, this includes the bytecode too. The compiled python bytecode is a python code object. Which is accessible via `__code__` attribute here. A function has an associated code object which carries some interesting information.
```python
# the file in which function is defined
# stdin here since this is run in an interpreter
>>> hello.__code__.co_filename
'<stdin>'
# number of arguments the function takes
>>> hello.__code__.co_argcount
1
# local variable names
>>> hello.__code__.co_varnames
('name',)
# the function code's compiled bytecode
>>> hello.__code__.co_code
b't\x00d\x01|\x00\x9b\x00d\x02\x9d\x03\x83\x01\x01\x00d\x00S\x00'
```
There are more code attributes which you can enlist by `>>> dir(hello.__code__)`
## Decorators
Related to functions, python has another feature called decorators. Let's see how that works, keeping `everything is an object` in mind.
Here is a sample decorator:
```python
>>> def deco(func):
... def inner():
... print("before")
... func()
... print("after")
... return inner
...
>>> @deco
... def hello_world():
... print("hello world")
...
>>>
>>> hello_world()
before
hello world
after
```
Here `@deco` syntax is used to decorate the `hello_world` function. It is essentially same as doing
```python
>>> def hello_world():
... print("hello world")
...
>>> hello_world = deco(hello_world)
```
What goes inside the `deco` function might seem complex. Let's try to uncover it.
1. Function `hello_world` is created
2. It is passed to `deco` function
3. `deco` create a new function
1. This new function is calls `hello_world` function
2. And does a couple other things
4. `deco` returns the newly created function
5. `hello_world` is replaced with above function
Let's visualize it for better understanding
```
BEFORE function_object (ID: 100)
"hello_world" +--------------------+
+ |print("hello_world")|
| | |
+--------------> | |
| |
+--------------------+
WHAT DECORATOR DOES
creates a new function (ID: 101)
+---------------------------------+
|input arg: function with id: 100 |
| |
|print("before") |
|call function object with id 100 |
|print("after") |
| |
+---------------------------^-----+
|
|
AFTER |
|
|
"hello_world" +-------------+
```
Note how the `hello_world` name points to a new function object but that new function object knows the reference (ID) of the original function.
## Some Gotchas
- While it is very quick to build prototypes in python and there are tons of libraries available, as the codebase complexity increases, type errors become more common and will get hard to deal with. (There are solutions to that problem like type annotations in python. Checkout [mypy](http://mypy-lang.org/).)
- Because python is dynamically typed language, that means all types are determined at runtime. And that makes python run very slow compared to other statically typed languages.
- Python has something called [GIL](https://www.dabeaz.com/python/UnderstandingGIL.pdf) (global interpreter lock) which is a limiting factor for utilizing multiple CPI cores for parallel computation.
- Some weird things that python does: https://github.com/satwikkansal/wtfpython

View File

@ -0,0 +1,56 @@
# Python, Web amd Flask
Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content etc.)
Since serving web requests is no longer a simple task like reading files from disk and return contents, we need to process each http request, perform some operations programmatically and construct a response.
## Sockets
Though we have frameworks like flask, HTTP is still a protocol that works over TCP protocol. So let us setup a TCP server and send an HTTP request and inspect the request's payload. Note that this is not a tutorial on socket programming but what we are doing here is inspecting HTTP protocol at its ground level and look at what its contents look like. (Ref: [Socket Programming in Python (Guide) on RealPython](https://realpython.com/python-sockets/))
```python
import socket
HOST = '127.0.0.1' # Standard loopback interface address (localhost)
PORT = 65432 # Port to listen on (non-privileged ports are > 1023)
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, PORT))
s.listen()
conn, addr = s.accept()
with conn:
print('Connected by', addr)
while True:
data = conn.recv(1024)
if not data:
break
print(data)
```
Then we open `localhost:65432` in our web browser and following would be the output:
```bash
Connected by ('127.0.0.1', 54719)
b'GET / HTTP/1.1\r\nHost: localhost:65432\r\nConnection: keep-alive\r\nDNT: 1\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nSec-Fetch-Site: none\r\nSec-Fetch-Mode: navigate\r\nSec-Fetch-User: ?1\r\nSec-Fetch-Dest: document\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en-US,en;q=0.9\r\n\r\n'
```
Examine closely and the content will look like the HTTP protocol's format. ie:
```
HTTP_METHOD URI_PATH HTTP_VERSION
HEADERS_SEPARATED_BY_SEPARATOR
```
So though it's a blob of bytes, knowing [http protocol specification](https://tools.ietf.org/html/rfc2616), you can parse that string (ie: split by `\r\n`) and get meaningful information out of it.
## Flask
Flask, and other such frameworks does pretty much what we just discussed in the last section (with added more sophistication). They listen on a port on a TCP socket, receive an HTTP request, parse the data according to protocol format and make it available to you in a convenient manner.
ie: you can access headers in flask by `request.headers` which is made available to you by splitting above payload by `/r/n`, as defined in http protocol.
Another example: we register routes in flask by `@app.route("/hello")`. What flask will do is maintain a registry internally which will map `/hello` with the function you decorated with. Now whenever a request comes with the `/hello` route (second component in the first line, split by space), flask calls the registered function and returns whatever the function returned.
Same with all other web frameworks in other languages too. They all work on similar principles. What they basically do is understand the HTTP protocol, parses the HTTP request data and gives us programmers a nice interface to work with HTTP requests.
Not so much of magic, innit?

View File

@ -0,0 +1,45 @@
# SRE Parts of The App and Conclusion
## Scaling The App
The design and development is just a part of the journey. We will need to setup continuous integration and continuous delivery pipelines sooner or later. And we have to deploy this app somewhere.
Initially we can start with deploying this app on one virtual machine on any cloud provider. But this is a `Single point of failure` which is something we never allow as an SRE (or even as an engineer). So an improvement here can be having multiple instances of applications deployed behind a load balancer. This certainly prevents problems of one machine going down.
Scaling here would mean adding more instances behind the load balancer. But this is scalable upto only a certain point. After that, other bottlenecks in the system will start appearing. ie: DB will become the bottleneck, or perhaps the load balancer itself. How do you know what is the bottleneck? You need to have observability into each aspects of the application architecture.
Only after you have metrics, you will be able to know what is going wrong where. **What gets measured, gets fixed!**
Get deeper insights into scaling from School Of SRE's Scalability module and post going through it, apply your learnings and takeaways to this app. Think how will we make this app geographically distributed and highly available and scalable.
## Monitoring Strategy
Once we have our application deployed. It will be working ok. But not forever. Reliability is in the title of our job and we make systems reliable by making the design in a certain way. But things still will go down. Machines will fail. Disks will behave weirdly. Buggy code will get pushed to production. And all these possible scenarios will make the system less reliable. So what do we do? **We monitor!**
We keep an eye on the system's health and if anything is not going as expected, we want ourselves to get alerted.
Now let's think in terms of the given url shortening app. We need to monitor it. And we would want to get notified in case something goes wrong. But we first need to decide what is that _something_ that we want to keep an eye on.
1. Since it's a web app serving HTTP requests, we want to keep an eye on HTTP Status codes and latencies
2. Request volume again is a good candidate, if the app is receiving an unusual amount of traffic, something might be off.
3. We also want to keep an eye on the database so depending on the database solution chosen. Query times, volumes, disk usage etc.
4. Finally, there also needs to be some external monitoring which runs periodic tests from devices outside of your data centers. This emulates customers and ensures that from customer point of view, the system is working as expected.
## SRE Use-cases
In the world of SRE, python is a widely used language. For small scripts and tooling developed for various purposes. Since tooling developed by SRE works with critical pieces of infrastructure and has great power (to bring things down), it is important to know what you are doing while using a programming language and its features. Also it is equally important to know the language and its characteristics while debugging the issues. As an SRE having a deeper understanding of python language, it has helped me a lot to debug very sneaky bugs and be generally more aware and informed while making certain design decisions.
While developing tools may or may not be part of SRE job, supporting tools or services is more likely to be a daily duty. Building an application or tool is just a small part of productionization. While there is certainly that goes in the design of the application itself to make it more robust, as an SRE you are responsible for its reliability and stability once it is deployed and running. And to ensure that, youd need to understand the application first and then come up with a strategy to monitor it properly and be prepared for various failure scenarios.
## Optional Exercises
1. Make a decorator that will cache function return values depending on input parameters.
2. Host the URL shortening app on any cloud provider.
3. Setup monitoring using many of the tools available like catchpoint, datadog etc.
4. Create a minimal flask-like framework on top of TCP sockets.
## Conclusion
This module, in the first part, aims to make you more aware of the things that will happen when you choose python as your programming language and what happens when you run a python program. With the knowledge of how python handles things internally as objects, lot of seemingly magic things in python will start to make more sense.
The second part will first explain how a framework like flask works using the existing knowledge of protocols like TCP and HTTP. It then touches the whole lifecycle of an application development lifecycle including the SRE parts of it. While the design and areas in architecture considered will not be exhaustive, it will give a good overview of things that are also important being an SRE and why they are important.

View File

@ -0,0 +1,120 @@
# The URL Shortening App
Let's build a very simple URL shortening app using flask and try to incorporate all aspects of the development process including the reliability aspects. We will not be building the UI and we will come up with a minimal set of API that will be enough for the app to function well.
## Design
We don't jump directly to coding. First thing we do is gather requirements. Come up with an approach. Have the approach/design reviewed by peers. Evolve, iterate, document the decisions and tradeoffs. And then finally implement. While we will not do the full blown design document here, we will raise certain questions here that are important to the design.
### 1. High Level Operations and API Endpoints
Since it's a URL shortening app, we will need an API for generating the shorten link given an original link. And an API/Endpoint which will accept the shorten link and redirect to original URL. We are not including the user aspect of the app to keep things minimal. These two API should make app functional and usable by anyone.
### 2. How to shorten?
Given a url, we will need to generate a shortened version of it. One approach could be using random characters for each link. Another thing that can be done is to use some sort of hashing algorithm. The benefit here is we will reuse the same hash for the same link. ie: if lot of people are shortening `https://www.linkedin.com` they all will have the same value, compared to multiple entries in DB if chosen random characters.
What about hash collisions? Even in random characters approach, though there is a less probability, hash collisions can happen. And we need to be mindful of them. In that case we might want to prepend/append the string with some random value to avoid conflict.
Also, choice of hash algorithm matters. We will need to analyze algorithms. Their CPU requirements and their characteristics. Choose one that suits the most.
### 3. Is URL Valid?
Given a URL to shorten, how do we verify if the URL is valid? Do we even verify or validate? One basic check that can be done is see if the URL matches a regex of a URL. To go even further we can try opening/visiting the URL. But there are certain gotchas here.
1. We need to define success criteria. ie: HTTP 200 means it is valid.
2. What is the URL is in private network?
3. What if URL is temporarily down?
### 4. Storage
Finally, storage. Where will we store the data that we will generate over time? There are multiple database solutions available and we will need to choose the one that suits this app the most. Relational database like MySQL would be a fair choice but **be sure to checkout School of SRE's database section for deeper insights into making a more informed decision.**
### 5. Other
We are not accounting for users into our app and other possible features like rate limiting, customized links etc but it will eventually come up with time. Depending on the requirements, they too might need to get incorporated.
The minimal working code is given below for reference but I'd encourage you to come up with your own.
```python
from flask import Flask, redirect, request
from hashlib import md5
app = Flask("url_shortener")
mapping = {}
@app.route("/shorten", methods=["POST"])
def shorten():
global mapping
payload = request.json
if "url" not in payload:
return "Missing URL Parameter", 400
# TODO: check if URL is valid
hash_ = md5()
hash_.update(payload["url"].encode())
digest = hash_.hexdigest()[:5] # limiting to 5 chars. Less the limit more the chances of collission
if digest not in mapping:
mapping[digest] = payload["url"]
return f"Shortened: r/{digest}\n"
else:
# TODO: check for hash collission
return f"Already exists: r/{digest}\n"
@app.route("/r/<hash_>")
def redirect_(hash_):
if hash_ not in mapping:
return "URL Not Found", 404
return redirect(mapping[hash_])
if __name__ == "__main__":
app.run(debug=True)
"""
OUTPUT:
===> SHORTENING
spatel1-mn1:tmp spatel1$ curl localhost:5000/shorten -H "content-type: application/json" --data '{"url":"https://linkedin.com"}'
Shortened: r/a62a4
===> REDIRECTING, notice the response code 302 and the location header
spatel1-mn1:tmp spatel1$ curl localhost:5000/r/a62a4 -v
* Uses proxy env variable NO_PROXY == '127.0.0.1'
* Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 5000 failed: Connection refused
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET /r/a62a4 HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.64.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 302 FOUND
< Content-Type: text/html; charset=utf-8
< Content-Length: 247
< Location: https://linkedin.com
< Server: Werkzeug/0.15.4 Python/3.7.7
< Date: Tue, 27 Oct 2020 09:37:12 GMT
<
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
* Closing connection 0
<p>You should be redirected automatically to target URL: <a href="https://linkedin.com">https://linkedin.com</a>. If not click the link.
"""
```

View File

@ -2,6 +2,16 @@ site_name: school_of_sre
docs_dir: courses
nav:
- Home: index.md
- Git:
- Git Basics: git/git-basics.md
- Working With Branches: git/branches.md
- Github and Hooks: git/github-hooks.md
- Python and Web:
- Intro: python_web/intro.md
- Some Python Concepts: python_web/python-concepts.md
- Python, Web and Flask: python_web/python-web-flask.md
- The URL Shortening App: python_web/url-shorten-app.md
- SRE Aspects of The App and Conclusion: python_web/sre-conclusion.md
- Systems Design:
- Intro: systems_design/intro.md
- Scalability: systems_design/scalability.md