Merge pull request #13 from linkedin/networking

Networking
This commit is contained in:
kalyan 2020-11-09 18:09:23 +05:30 committed by GitHub
commit f078658625
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 391 additions and 0 deletions

View File

@ -0,0 +1,142 @@
# DNS
Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open www.linkedin.com in the browser, the browser tries to convert www.linkedin.com to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this
```python
ip, err = getIPAddress(domainName)
if err:
print(“unknown Host Exception while trying to resolve:%s”.format(domainName))
```
Now lets try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName
```python
def getIPAddress(domainName):
resp, fail = lookupCache(domainName)
If not fail:
return resp
else:
resp, err = gethostbyname(domainName)
if err:
return null, err
else:
return resp
```
Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file [/etc/nsswitch.conf](https://man7.org/linux/man-pages/man5/nsswitch.conf.5.html) file which usually has a line
```bash
hosts: files dns
```
This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts.
The file /etc/hosts is of format
IPAddress FQDN [FQDN].*
```bash
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
```
If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file
```bash
127.0.0.1 test.linkedin.com
```
And then do ping test.linkedin.com
```bash
ping test.linkedin.com -n
```
```bash
PING test.linkedin.com (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.036 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.037 ms
```
As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by [DHCP](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) or statically configured by an administrator.
[Dig](https://linux.die.net/man/1/dig) is a userspace DNS system which creates and sends requests to DNS resolvers and prints the response it receives to the console.
```bash
#run this command in one shell to capture all DNS requests
sudo tcpdump -s 0 -A -i any port 53
#make a dig request from another shell
dig linkedin.com
```
```bash
13:19:54.432507 IP 172.19.209.122.56497 > 172.23.195.101.53: 527+ [1au] A? linkedin.com. (41)
....E..E....@.n....z...e...5.1.:... .........linkedin.com.......)........
13:19:54.485131 IP 172.23.195.101.53 > 172.19.209.122.56497: 527 1/0/1 A 108.174.10.10 (57)
....E..U..@.|. ....e...z.5...A...............linkedin.com..............3..l.
..)........
```
The packet capture shows a request is made to 172.23.195.101:53 (this is the resolver in /etc/resolv.conf) for linkedin.com and a response is received from 172.23.195.101 with the IP address of linkedin.com 108.174.10.10
Now let's try to understand how DNS resolver tries to find the IP address of linkedin.com. DNS resolver first looks at its cache. Since many devices in the network can query for the domain name linkedin.com, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks “linkedin.com” to “.”, “com.” and “linkedin.com.” and starts DNS resolution from “.”. The “.” is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain Nameservers to find the right nameservers which could respond regarding details for "com.". The address of the authoritative nameserver of “com.” is returned. Now the DNS resolution service contacts the authoritative nameserver for “com.” to fetch the authoritative nameserver for “linkedin.com”. Once an authoritative nameserver of “linkedin.com” is known, the resolver contacts Linkedins nameserver to provide the IP address of “linkedin.com”. This whole process can be visualized by running
```bash
dig +trace linkedin.com
```
```bash
linkedin.com. 3600 IN A 108.174.10.10
```
This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of linkedin.com is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for linkedin.com beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone.
The 4th field says the type of DNS response/request. Some of the various DNS query types are
A, AAAA, NS, TXT, PTR, MX and CNAME.
- A record returns IPV4 address of the domain name
- AAAA record returns the IPV6 address of the domain Name
- NS record returns the authoritative nameserver for the domain name
- CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example www.linkedin.coms IP address is the same as 2-01-2c3e-005a.cdx.cedexis.net.
- For the brevity we are not discussing other DNS record types, the RFC of each of these records are available [here](https://en.wikipedia.org/wiki/List_of_DNS_record_types).
```bash
dig A linkedin.com +short
108.174.10.10
dig AAAA linkedin.com +short
2620:109:c002::6cae:a0a
dig NS linkedin.com +short
dns3.p09.nsone.net.
dns4.p09.nsone.net.
dns2.p09.nsone.net.
ns4.p43.dynect.net.
ns1.p43.dynect.net.
ns2.p43.dynect.net.
ns3.p43.dynect.net.
dns1.p09.nsone.net.
dig www.linkedin.com CNAME +short
2-01-2c3e-005a.cdx.cedexis.net.
```
Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs.
## Applications in SRE role
This section covers some of the common solutions SRE can derive from DNS
1. Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesnt become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects.
2. DNS can also be used for discovering services. For example the hostname serviceb.internal.example.com could list instances which run service b internally in example.com company. Cloud providers provide options to enable DNS discovery([example](https://docs.aws.amazon.com/whitepapers/latest/microservices-on-aws/service-discovery.html#dns-based-service-discovery))
3. DNS is used by cloud provides and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute.
4. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed.
5. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses.
6. Stale DNS cache can be a problem. Some [apps](https://stackoverflow.com/questions/1256556/how-to-make-java-honor-the-dns-caching-timeout) might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance.
7. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.

View File

@ -0,0 +1,129 @@
# HTTP
Till this point we have only got the IP address of linkedin.com. The HTML page of linkedin.com is served by HTTP protocol which the browser renders. Browser sends a HTTP request to the IP of the server determined above.
Request has a verb GET, PUT, POST followed by a path and query parameters and lines of key value pair which gives information about the client and capabilities of the client like contents it can accept and a body (usually in POST or PUT)
```bash
# Eg run the following in your container and have a look at the headers
curl linkedin.com -v
```
```bash
* Connected to linkedin.com (108.174.10.10) port 80 (#0)
> GET / HTTP/1.1
> Host: linkedin.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Mon, 09 Nov 2020 10:39:43 GMT
< X-Li-Pop: prod-esv5
< X-LI-Proto: http/1.1
< Location: https://www.linkedin.com/
< Content-Length: 0
<
* Connection #0 to host linkedin.com left intact
* Closing connection 0
```
Here, in the first line GET is the verb, / is the path and 1.1 is the HTTP protocol version. Then there are key value pairs which give client capabilities and some details to the server. The server responds back with HTTP version, Status Code and Status message. Status codes 2xx means success, 3xx denotes redirection, 4xx denotes client side errors and 5xx server side errors.
We will now jump in to see the difference between HTTP/1.0 and HTTP/1.1.
```bash
#On the terminal type
telnet www.linkedin.com 80
#Copy and paste the following with an empty new line at last in the telnet STDIN
GET / HTTP/1.1
HOST:linkedin.com
USER-AGENT: curl
```
This would get server response and waits for next input as the underlying connection to www.linkedin.com can be reused for further queries. While going through TCP, we can understand the benefits of this. But in HTTP/1.0 this connection will be immediately closed after the response meaning new connection has to be opened for each query. HTTP/1.1 can have only one inflight request in an open connection but connection can be reused for multiple requests one after another. One of the benefits of HTTP/2.0 over HTTP/1.1 is we can have multiple inflight requests on the same connection. We are restricting our scope to generic HTTP and not jumping to the intricacies of each protocol version but they should be straight forward to understand post the course.
HTTP is called **stateless protocol**. This section we will try to understand what stateless means. Say we logged in to linkedin.com, each request to linkedin.com from the client will have no context of the user and it makes no sense to prompt user to login for each page/resource. This problem of HTTP is solved by *COOKIE*. A user is created a session when a user logs in. This session identifier is sent to the browser via *SET-COOKIE* header. The browser stores the COOKIE till the expiry set by the server and sends the cookie for each request from hereon for linkedin.com. More details on cookies are available [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies). Cookies are a critical piece of information like password and since HTTP is a plain text protocol, any man in the middle can capture either password or cookies and can breach the privacy of the user. Similarly as discussed during DNS a spoofed IP of linkedin.com can cause a phishing attack on users where an user can give linkedins password to login on the malicious site. To solve both problems HTTPs came in place and HTTPs has to be mandated.
HTTPS has to provide server identification and encryption of data between client and server. The server administrator has to generate a private public key pair and certificate request. This certificate request has to be signed by a certificate authority which converts the certificate request to a certificate. The server administrator has to update the certificate and private key to the webserver. The certificate has details about the server (like domain name for which it serves, expiry date), public key of the server. The private key is a secret to the server and losing the private key loses the trust the server provides. When clients connect, the client sends a HELLO. The server sends its certificate to the client. The client checks the validity of the cert by seeing if it is within its expiry time, if it is signed by a trusted authority and the hostname in the cert is the same as the server. This validation makes sure the server is the right server and there is no phishing. Once that is validated, the client negotiates a symmetrical key and cipher with the server by encrypting the negotiation with the public key of the server. Nobody else other than the server who has the private key can understand this data. Once negotiation is complete, that symmetric key and algorithm is used for further encryption which can be decrypted only by client and server from thereon as they only know the symmetric key and algorithm. The switch to symmetric algorithm from asymmetric encryption algorithm is to not strain the resources of client devices as symmetric encryption is generally less resource intensive than asymmetric.
```bash
#Try the following on your terminal to see the cert details like Subject Name(domain name), Issuer details, Expiry date
curl https://www.linkedin.com -v
```
```bash
* Connected to www.linkedin.com (13.107.42.14) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [230 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [90 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3171 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [365 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [102 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=US; ST=California; L=Sunnyvale; O=LinkedIn Corporation; CN=www.linkedin.com
* start date: Oct 2 00:00:00 2020 GMT
* expire date: Apr 2 12:00:00 2021 GMT
* subjectAltName: host "www.linkedin.com" matched cert's "www.linkedin.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fb055808200)
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
0 82117 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Connection #0 to host www.linkedin.com left intact
HTTP/2 200
cache-control: no-cache, no-store
pragma: no-cache
content-length: 82117
content-type: text/html; charset=utf-8
expires: Thu, 01 Jan 1970 00:00:00 GMT
set-cookie: JSESSIONID=ajax:2747059799136291014; SameSite=None; Path=/; Domain=.www.linkedin.com; Secure
set-cookie: lang=v=2&lang=en-us; SameSite=None; Path=/; Domain=linkedin.com; Secure
set-cookie: bcookie="v=2&70bd59e3-5a51-406c-8e0d-dd70befa8890"; domain=.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; SameSite=None
set-cookie: bscookie="v=1&202011091050107ae9b7ac-fe97-40fc-830d-d7a9ccf80659AQGib5iXwarbY8CCBP94Q39THkgUlx6J"; domain=.www.linkedin.com; Path=/; Secure; Expires=Wed, 09-Nov-2022 22:27:42 GMT; HttpOnly; SameSite=None
set-cookie: lissc=1; domain=.linkedin.com; Path=/; Secure; Expires=Tue, 09-Nov-2021 10:50:10 GMT; SameSite=None
set-cookie: lidc="b=VGST04:s=V:r=V:g=2201:u=1:i=1604919010:t=1605005410:v=1:sig=AQHe-KzU8i_5Iy6MwnFEsgRct3c9Lh5R"; Expires=Tue, 10 Nov 2020 10:50:10 GMT; domain=.linkedin.com; Path=/; SameSite=None; Secure
x-fs-txn-id: 2b8d5409ba70
x-fs-uuid: 61bbf94956d14516302567fc882b0000
expect-ct: max-age=86400, report-uri="https://www.linkedin.com/platform-telemetry/ct"
x-xss-protection: 1; mode=block
content-security-policy-report-only: default-src 'none'; connect-src 'self' www.linkedin.com www.google-analytics.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://linkedin.sc.omtrdc.net/b/ss/ static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; script-src 'sha256-THuVhwbXPeTR0HszASqMOnIyxqEgvGyBwSPBKBF/iMc=' 'sha256-PyCXNcEkzRWqbiNr087fizmiBBrq9O6GGD8eV3P09Ik=' 'sha256-2SQ55Erm3CPCb+k03EpNxU9bdV3XL9TnVTriDs7INZ4=' 'sha256-S/KSPe186K/1B0JEjbIXcCdpB97krdzX05S+dHnQjUs=' platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'self' 'unsafe-inline' static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com; media-src dms.licdn.com; child-src blob: *; frame-src 'self' lnkd.demdex.net linkedin.cdn.qualaroo.com; manifest-src 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=g
content-security-policy: default-src *; connect-src 'self' https://media-src.linkedin.com/media/ www.linkedin.com s.c.lnkd.licdn.com m.c.lnkd.licdn.com s.c.exp1.licdn.com s.c.exp2.licdn.com m.c.exp1.licdn.com m.c.exp2.licdn.com wss://*.linkedin.com dms.licdn.com https://dpm.demdex.net/id lnkd.demdex.net blob: https://accounts.google.com/gsi/status https://linkedin.sc.omtrdc.net/b/ss/ www.google-analytics.com static.licdn.com static-exp1.licdn.com static-exp2.licdn.com static-exp3.licdn.com media.licdn.com media-exp1.licdn.com media-exp2.licdn.com media-exp3.licdn.com; img-src data: blob: *; font-src data: *; style-src 'unsafe-inline' 'self' static-src.linkedin.com *.licdn.com; script-src 'report-sample' 'unsafe-inline' 'unsafe-eval' 'self' spdy.linkedin.com static-src.linkedin.com *.ads.linkedin.com *.licdn.com static.chartbeat.com www.google-analytics.com ssl.google-analytics.com bcvipva02.rightnowtech.com www.bizographics.com sjs.bizographics.com js.bizographics.com d.la4-c1-was.salesforceliveagent.com slideshare.www.linkedin.com https://snap.licdn.com/li.lms-analytics/ platform.linkedin.com platform-akam.linkedin.com platform-ecst.linkedin.com platform-azur.linkedin.com; object-src 'none'; media-src blob: *; child-src blob: lnkd-communities: voyager: *; frame-ancestors 'self'; report-uri https://www.linkedin.com/platform-telemetry/csp?f=l
x-frame-options: sameorigin
x-content-type-options: nosniff
strict-transport-security: max-age=2592000
x-li-fabric: prod-lva1
x-li-pop: afd-prod-lva1
x-li-proto: http/2
x-li-uuid: Ybv5SVbRRRYwJWf8iCsAAA==
x-msedge-ref: Ref A: CFB9AC1D2B0645DDB161CEE4A4909AEF Ref B: BOM02EDGE0712 Ref C: 2020-11-09T10:50:10Z
date: Mon, 09 Nov 2020 10:50:10 GMT
* Closing connection 0
```
Here my system has a list of certificate authorities it trusts in this file /etc/ssl/cert.pem. Curl validates the certificate is for www.linkedin.com by seeing the CN section of the subject part of the certificate. It also makes sure the certificate is not expired by seeing the expire date. It also validates the signature on the certificate by using the public key of issuer Digicert in /etc/ssl/cert.pem. Once this is done, using the public key of www.linkedin.com it negotiates cipher TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 with a symmetric key. Subsequent data transfer including first HTTP request uses the same cipher and symmetric key.

Binary file not shown.

After

Width:  |  Height:  |  Size: 840 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 395 KiB

View File

@ -0,0 +1,20 @@
# Linux Networking Fundamentals
## Pre - Reads
This course requires high-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP. Basic familiarity with Linux jargon is sufficient to start this course. This course also expects basic exposure to Linux command-line tools. The course will require you to install certain utilities and run them as a part of the course exercises.
## What to expect from this course
Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the birds eye view of the functioning of the Internet.
## What is not covered under this course
This course spends time on the fundamentals. We are not covering concepts like HTTP/2.0, QUIC, TCP congestion control protocols, Anycast, BGP, CDN, Tunnels and Multicast. We expect that this course will provide the relevant basics to understand such concepts
## Course Content
### Birds eye view of the course
The course covers the question “What happens when you open linkedin.com in your browser?” The course follows the flow of TCP/IP stack.More specifically, the course covers topics of Application layer protocols DNS and HTTP, transport layer protocols UDP and TCP, networking layer protocol IP and Data Link Layer protocol

View File

@ -0,0 +1,43 @@
# IP Routing and Data Link Layer
We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP(discovered from DNS) and then looks up the route to the destination IP on the routing table.
```bash
#Linux route -n command gives the default routing table
route -n
```
```bash
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
```
Here the destination IP is bitwise ANDd with the Genmask and if the answer is the destination part of the table then that gateway and interface is picked for routing. Here linkedin.coms IP 108.174.10.10 is ANDd with 255.255.255.0 and the answer we get is 108.174.10.0 which doesnt match with any destination in the routing table. Then Linux does an AND of destination IP with 0.0.0.0 and we get 0.0.0.0. This answer matches the default row
Routing table is processed in the order of more octets of 1 set in genmask and genmask 0.0.0.0 is the default route if nothing matches.
At the end of this operation Linux figured out that the packet has to be sent to next hop 172.17.0.1 via eth0. The source IP of the packet will be set as the IP of interface eth0.
Now to send the packet to 172.17.0.1 linux has to figure out the MAC address of 172.17.0.1. MAC address is figured by looking at the internal arp cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has 172.17.0.1. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source mac address as mac address of eth0 and destination mac address of 172.17.0.1 which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops only till the IP/Network layer is involved.
![Screengrab for above explanation](images/arp.gif)
One weird gateway we saw in the routing table is 0.0.0.0. This gateway means no Layer3(Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the mac of the destination and populate source and destination mac appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle
As we followed in other modules, lets complete this session with SRE usecases
## Applications in SRE role
1. Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary
2. Understanding error messages better like, “No route to host” error can mean mac address of the destination host is not found and it can mean the destination host is down
3. On rare cases looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior
## Conclusion
With this we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course.
During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE.
# Post Training Exercises
1. Setup own DNS resolver in the dev environment which acts as an authoritative DNS server for example.com and forwarder for other domains. Update resolv.conf to use the new DNS resolver running in localhost
2. Set up a site dummy.example.com in localhost and run a webserver with a self signed certificate. Update the trusted CAs or pass self signed CAs public key as a parameter so that curl https://dummy.example.com -v works properly without self signed cert warning
3. Update the routing table to use another host(container/VM) in the same network as a gateway for 8.8.8.8/32 and run ping 8.8.8.8. Do the packet capture on the new gateway to see L3 hop is working as expected(might need to disable icmp_redirect)

View File

@ -0,0 +1,35 @@
# TCP
TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control.
TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three way handshake. In our case, the client sends a SYN packet along with the starting sequence number it plans to use, the server acknowledges the SYN packet and sends a SYN with its sequence number. Once the client acknowledges the syn packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party
![3-way handshake](images/established.png)
```bash
#To understand handshake run packet capture on one bash session
tcpdump -S -i any port 80
#Run curl on one bash session
curl www.linkedin.com
```
![tcpdump-3way](images/pcap.png)
Here client sends a syn flag shown by [S] flag with a sequence number 1522264672. The server acknowledges receipt of SYN with an ack [.] flag and a Syn flag for its sequence number[S]. The server uses the sequence number 1063230400 and acknowledges the client its expecting sequence number 1522264673 (client sequence+1). Client sends a zero length acknowledgement packet to the server(server sequence+1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1 this same connection can be reused which reduces overhead of 3 way handshake for each HTTP request. If a packet is missed between client and server, server wont send an ack to the client and client would retry sending the packet till the ACK is received. This guarantees reliability.
The flow control is established by the win size field in each segment. The win size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem
TCP also does congestion control which determines how many segments can be in transit without an ack. Linux provides us the ability to configure algorithms for congestion control which we are not covering here.
While closing a connection, client/server calls a close syscall. Let's assume client do that. Clients kernel will send a FIN packet to the server. Servers kernel cant close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a FIN packet and client enters into time wait state for 2*MSS(120s) so that this socket cant be reused for that time period to prevent any TCP state corruptions due to stray stale packets.
![Connection tearing](images/closed.png)
Armed with our TCP and HTTP knowledge lets see how this is used by SREs in their role
## Applications in SRE role
1. Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are [different kinds of load balancing](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236?gi=428394dbdcc3) like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs.
2. Tweaking sysctl variables for rmem and wmem like we did for UDP can improve throughput of sender and receiver.
3. Sysctl variable tcp_max_syn_backlog and socket variable somax_conn determines how many connections for which the kernel can complete 3 way handshake before app calling accept syscall. This is much useful in single threaded applications. Once the backlog is full, new connections stay in SYN_RCVD state (when you run netstat) till the application calls accept syscall
4. Apps can run out of file descriptors if there are too many short lived connections. Digging through [tcp_reuse and tcp_recycle](http://lxr.linux.no/linux+v3.2.8/Documentation/networking/ip-sysctl.txt#L464) can help reduce time spent in the time wait state(it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help
5. Understanding performance bottlenecks by seeing metrics and classifying whether its a problem in App or network side. Example too many sockets in Close_wait state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is

View File

@ -0,0 +1,15 @@
# UDP
UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP(most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client(eg dig) and DNS server(eg named). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any [ports](https://en.wikipedia.org/wiki/Port_(computer_networking)). DNS servers usually listen on port number 53. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via **sendto** system call. The kernel picks a random port number([>1024](https://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html)) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a **recvfrom** system call and reads the packet. This process by the kernel is called multiplexing(combining packets from multiple applications to same lower layers) and demultiplexing(segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer.
UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So it doesnt do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application
This [example from python wiki](https://wiki.python.org/moin/UdpCommunication) covers a sample UDP client and server where “Hello World” is an application payload sent to server listening on port number 5005. The server receives the packet and prints the “Hello World” string from the client
## Applications in SRE role
1. If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, sendto syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using [sysctl variables](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-adjusting_network_settings-changing_network_kernel_settings) *net.core.wmem_max* and *net.core.wmem_default* provides some cushion to the application from the slow network
2. Similarly if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it cant queue due to the buffer being full. Since UDP doesnt guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables *rmem_default* and *rmem_max* can provide some cushion to slow applications from fast senders.

View File

@ -6,6 +6,13 @@ nav:
- Git Basics: git/git-basics.md
- Working With Branches: git/branches.md
- Github and Hooks: git/github-hooks.md
- Linux Networking:
- Intro: linux_networking/intro.md
- DNS: linux_networking/dns.md
- UDP: linux_networking/udp.md
- HTTP: linux_networking/http.md
- TCP: linux_networking/tcp.md
- Routing and Conclusion: linux_networking/ipr.md
- Python and Web:
- Intro: python_web/intro.md
- Some Python Concepts: python_web/python-concepts.md