# A Comprehensive Guide to the Web's Core Protocol, HTTP for Hacker & Developers

# Overview of HTTP

Hypertext Transfer Protocol (HTTP) is an **application-layer protocol** or layer 7 protocol on OSI model. It was introduced in 1991 by Sir Tim Berners-Lee, British computer scientist who also created the World Wide Web (WWW). It was designed for transmitting hypermedia documents over the internet, such as HTML. HTTP protocol defines set of rules for how data should be formatted and transmitted between the client and server. This ensures reliable and standardized communication. A crucial point to understand that, **HTTP is a stateless protocol** that means each request from a client to a server is independent and unrelated to any previous requests because of that the server does not keep any data (state) between two requests. More on that in cookies section.

## Client-Server Architecture

HTTP protocol follows *Client-Server* architecture, with a client (e.g., web browser) opening a connection to make a request, then waiting until it receives a response from server or request get timeout. HTTP operates on **Transmission Control Protocol (TCP)** which ensure reliable, ordered, error-checked delivery and completeness of data transmission.

* **Client:**
    
    * Web Browsers, Programming Languages, or Many App That Can Make HTTP Request such as, Curl.
        
* **Server:**
    
    * Well-Known HTTP Web Servers e.g. - Nginx, IIS, NodeJS, Apache Tomcat etc.
        

# Importance of Understanding HTTP

HTTP is a core foundation of today's technological world. As a programmer and hacker, HTTP protocol gives you understanding of writing better web-based applications and secure applications too. It gives you a distinct view of looking a website and debug them.

Let's understand with an example, Do you know that you can access different resources or perform different actions if I make a request to same URI like [`flarexes.com/action`](http://flarexes.com/action). You may be thing but how? Different response from same request URI? This is possible via various ways but common one is **HTTP Methods**. I will cover them later. But sometimes programmers don't restrict allowed HTTP methods and this become a dangling point where a hacker you can potentially find a enter point.

Having basic understanding of HTTP is invaluable for **Web Development**, **API Development**, **Performance Optimization**, **Security**, **Troubleshooting and Debugging**.

# Fundamentals of HTTP

One good thing about this protocol is, it's super simple to understand. I recommend you going through it's RFCs when you have time. It's just a formatted text that has been standardized.

From client-server architecture section, we know that HTTP has two main components. First, Request that client makes and second, response that server sends. So, Let's understand both of them. We'll also take a seek-peek of "how this standardized format looks like?".

## HTTP Request

An HTTP request is made by a *client to server* asking to access a resource on the server.

A correctly composed HTTP request contains the following elements:

1. **Method**
    
    * HTTP supports a set of methods, which defines the action to be performed on a given resource. e.g. - **GET, POST, DELETE, PUT, OPTIONS**.
        
2. **Request-URI**
    
    * A Request-URI locates to an existing resource on the internet, such as HTML or Image. e.g. - **/path/images/cat.png**.
        
3. **HTTP Version**
    
    * HTTP version indicates server which version of HTTP client is using. This wasn't mandatory in HTTP/0.9 and HTTP/1.0. It is important to note it allow server to respond appropriately otherwise it might cause issues because newer versions of HTTP has features that previous once don't support.
        
4. **Headers**
    
    * Headers let the client and server transmit additional information with an HTTP request or response. There are many standard HTTP headers defined in HTTP specification. Like `Host` which is a mandatory field except for **HTTP/0.9**. But client and server also add custom headers like `x-token` or anything meaningful.
        
5. **Blank Line**
    
    * To indicate end of headers (if there is a body, an optional HTTP specification).
        

Example of simplest possible HTTP request:

```http
GET /resourse HTTP/1.1
Host: api.example.com
```

## HTTP Response

An HTTP response is made by the *server to client* in response to it's request. A response contains the requested resource with other information which client can interpret.

A common composed HTTP response contains the following elements:

1. **HTTP Version**
    
    * Same as HTTP request from client, server also send HTTP version to client indicating which version of HTTP server is using. To avoid any conflicts in future.
        
2. **Status Code**
    
    * A status code is a three-digit number indicating the result type of the response, was it success, fail or unauthorized.
        
3. **Status Text**
    
    * A status text is human-readable text that summaries the meaning of the status code.
        
        > Combination of *HTTP Version*, *Status Code*, *Status Text* is also known as **Status Line**.
        
    * You can learn more about HTTP status codes MDN Documentation link below. Mainly, HTTP response are grouped in five classes:
        
        1. [Informational responses](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#information_responses) (`100`–`199`)
            
        2. [Successful responses](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#successful_responses) (`200`–`299`)
            
        3. [Redirection messages](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#redirection_messages) (`300`–`399`)
            
        4. [Client error responses](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses) (`400`–`499`)
            
        5. [Server error responses](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#server_error_responses) (`500`–`599`)
            
4. **HTTP Header**
    
    * HTTP response header contain information that client can use to learn more about the response type and server. Below is an example of HTTP response headers which tells client, when response was sent, it was sent by GWS and that was a JPEG image:
        

```http
Date: Sat, 08 Jun 2024 12:07:48 GMT
Server: gws
Content-type: image/jpg
```

5. **Body**
    
    * The message body in an HTTP response contain either the requested resource or additional information about the status of the action requested by the client. From out above example on headers, this could be an image.
        
    * Most responses have body except when client has used **HEAD Method**. Which request server to response headers without body. This come handle while debugging unexpected results.
        

## Hands-on Practical Lab

There are hundreds of softwares, tools and utilities that allow users to send and receive HTTP headers. So, I'll first list few of my favourite once but later on I will be going with `curl`. Curl is wide-adopted command-line utility for interacting with HTTP and other protocols too. It's comes pre-installed in major UNIX/LINUX based operating system and can be installed in windows too.

1. **Curl:** Simple, easy, gets the job done.
    
2. **Python:** Complete control over every aspect of the HTTP protocol.
    
3. **Postman:** An another powerful API testing client known for its rich feature set.
    
4. **HTTPie:** API testing client with cli & gui versions. You can also try httpie online from [HTTPie for Web](https://httpie.io/app).
    
5. **Burp Suite:** Hacker's favourite security testing software with extensive features including custom HTTP request capabilities.
    

### Show Request and Response Headers

Just looking at output below, you can tell alot of things. Client is requesting `/` or root of `Host` via `GET` method. Also request is contracted from curl. Server respond with the same version of HTTP that client used **HTTP/1.1** with status code `301` which means requested resource is moved to `Location:` [`http://www.google.com/`](http://www.google.com/).

It's an interesting observation that you won't see while browsing [`google.com`](http://google.com) in web-browser. Google actually doesn't host anything on [`google.com`](http://google.com), instead they redirect you to [`www.google.com`](http://www.google.com).

```bash
$ curl -v google.com

> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Server: gws
< Content-Length: 219
```

### Show Request and Response Headers with Redirects

Below command follow all redirects till it keeping getting **Location** header in server's HTTP Response.

```bash
$ curl -L -v google.com

> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
<
> GET / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 08 Jun 2024 10:57:17 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
```

### Send Request From Specific HTTP Version

Keeping in mind not all websites supports ever HTTP versions. Let's explore the fact, in HTTP transmission both client and server use same HTTP version if they can.

```bash
# HTTP/0.9 Version 
$ curl --head --http0.9 https://vercel.com

HTTP/2 200
server: Vercel
```

As you see, when I make a HTTP GET request from **HTTP/0.9** version, server replied with **HTTP/2** that show vercel doesn't support HTTP/0.9 version. Curl's default request method is **GET** and `--head` means only show HTTP response header.

Now, Try these commands and check HTTP version vercel replys with. It would be same as request.

```bash
# HTTP/1.1 Version 
$ curl --head --http1.1 https://vercel.com

# HTTP/2 Version 
$ curl --head --http2 https://vercel.com
```

# Evolution of HTTP Versions

As far we have got the basic understanding of HTTP. However it's also important to know how it came so far. From a one line of protocol to spreading all over the internet, starting with **HTTP/0.9**.

## HTTP/0.9 – The One Line Protocol

HTTP/0.9 was the simplest version of HTTP protocol. The request consists only possible method **GET**, followed by the path to the desired resource. There were no HTTP headers. This meant server can only respond with HTML files. After server every response connection get terminated.

```http
GET /mypage.html
```

**Key Points:**

* **Request Format:** GET Method + Path of the resource
    
* **Response Content:** Limited to hypertext files
    
* **Supported Methods:** Only *GET*
    
* **Connection Nature:** Terminated once server dispatch response
    
* **Limitation:** No Headers, No Status Codes, No URLs, No Multimedia Files
    

## HTTP/1.0 - The Building Block

[RFC 1945 Outlined The HTTP 1.0 Protocol](https://datatracker.ietf.org/doc/html/rfc1945)

To overcome the limitation of **HTTP/0.9**, a newer version **HTTP/1.0** was introduced after few years. Established connections in HTTP/1.0 are **short-lived**. That means, a new connection created for each request and closed once response had been received. Therefore HTTP/1.0 is **Non-Persistent Connections.**

This was okay till modern web-pages started to get complex and required many requests to serve the page. HTTP/0.9 and HTTP/1.0 both are **short-lived connections** and **Non-Persistent Connections.** It lead to a different problem of **Three-Way Handshake**. HTTP is based on TCP that means a TCP handshake happens before each HTTP request and this in itself was time-consuming and resources heavy especially in 90's.

Example of HTTP/1.0 Request :

```http
GET /index.html HTTP/1.0 
Host: www.example.com 
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/xml,image/jpeg
```

Example of HTTP/1.0 Response :

```http
HTTP/1.0 200 OK
Date: Mon, 10 Jun 2024 12:00:00 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
Content-Type: text/html

<html>
A page with an image
  <img SRC="/image.jpg">
</html>
```

Followed second connection will also fetch image.

```http
GET /image.jpg HTTP/1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
```

```http
HTTP/1.0 200 OK
Date: Mon, 10 Jun 2024 12:00:04 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
Content-Type: text/jpeg
(image content)
```

**Key Points:**

* **Request:** Rich Metadata, Headers, HTTP Version, Status Code, Multimedia etc
    
* **Response:** Not limited to hypertext
    
* **Methods:** GET , HEAD , POST
    
* **Connection Nature:** Short-Lived
    
* **Limitations:** CPU Overhead, Buffering and Redundant Requests (handshake for each request)
    

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1735143642292/4d616c0f-b3f6-45ee-b93d-ce1cb3a6e1b6.png align="center")

## HTTP/1.1 - The Standardized Protocol

[RFC 2068](https://datatracker.ietf.org/doc/html/rfc2068) Outlined The HTTP 1.1 Protocol

The first standardized HTTP protocol, **HTTP/1.1** was released in 1997. Since then it had been gone through two revisions, first in 1999 and second one in 2014. These changes were defined in **RFC 2616** and **RFC 7230** to **RFC 7235**. But the first official HTTP/1.1 standard is defined in [RFC 2068](https://datatracker.ietf.org/doc/html/rfc2068), which was officially released in January 1997, roughly seven months after the publication of HTTP/1.0. It's been over two decades still HTTP/1.1 is most used HTTP version.

HTTP/1.1 introduced numerous improvement and overcame issues such as *non-persistence connection* and *TCP handshake overhead*. A **Keep-Alive** header was added to HTTP/1.1 specification. This header tells the server to not to close the connection. Therefore HTTP/1.1 is **Persistent Connections**. *Persistent HTTP* connections are designed to allow multiple request/response exchange without establishing a new connection every time.

HTTP/1.1 also introduced **Caching** mechanisms and **Pipelining**. HTTP pipelining is a technique that allows multiple HTTP requests to be sent over a single TCP connection without waiting for each response before sending the next request.

**Key Points:**

* **Methods:** GET , HEAD , POST , PUT , DELETE , TRACE , OPTIONS
    
* **Connection Nature:** Long-Lived
    
* **Attack Surface:** Prone to DOS Attack
    
* **New Features:** Pipelining , Caching , Persisted Connection
    
* **Advantages:** - Reduced Network Congestion, Chucked Transfer, Lower Resources Usage
    

## HTTP/2.0 and HTTP/3.0 (HTTP Over QUIC)

I will skip over from these two protocols. They both were designed to improve the performance of complex web-applications such as YouTube. HTTP/3 is still experimental but used by many web-applications. Story of these protocols is fascinating too but nothing much add here except the part now HTTP is moving to **UDP With Congestion Control** instead of TCP. You can easily spot QUIC protocol in Wireshark capture on youtube. However, you can read about them from reference section. And If you want me to cover it comment below or on my socials, I'll update the article.

**HTTP/1.1 and HTTP/2 performance demonstration**.

%[https://www.youtube.com/watch?v=gqUCeGkTYjY] 

# Keep-Alive: Hands-on Practical Lab

To see how **HTTP/1.1** handles network congestion with **Keep-Alive** header; we need two things.

1. Active ***HTTP/1.0*** and ***HTTP/1.1*** servers to compare the difference
    
2. Wireshark to capture and analyze traffic
    

### Analyzing Request-Response Behavior in HTTP/1.0

**Step 1:** Create an html file named `index.html` and save an image under same directory named `./4k.jpg`.

```html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h1>This is me</h1>
    <img src="./4k.jpg" alt="">
</body>
</html>
```

**Step 2:** Simplest way to start an HTTP/1.0 server is using Python.

```bash
python -m http.server 8000 --protocol=http/1.0
```

This command will spin up a simple http server on port 8000 with specified version in this case **HTTP/1.0**.

**Step 3:** Verify server is responding in HTTP/1.0.

```bash
curl --head http://0.0.0.0:8000/

http/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.12.3
```

**Step 4:** Spin up Wireshark and start capturing traffic on [localhost](http://localhost) interface ***Loopback: lo***. Then visit the website.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1734348629524/2712863a-30e8-4981-b6db-77503704b81d.png align="center")

I redacted the unnecessary info but in boxes as you see. Their is a TCP handshake before each request. Now, In next section we will see how does HTTP/1.1 react to this situation.

### Analyzing Request-Response Behavior in HTTP/1.1

**Step 1:** Let's restart http server again on same directory with same code but on **HTTP/1.1** version.

```bash
python -m http.server 8000 --protocol=http/1.1
```

**Step 2:** Verify server is responding in HTTP/1.1.

```bash
curl --head http://0.0.0.0:8000/

http/1.1 200 OK
Server: SimpleHTTP/0.6 Python/3.12.3
```

**Step 4:** Spin up Wireshark and start capturing traffic on [localhost](http://localhost) interface ***Loopback: lo***. Then visit the website.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1734348659815/edcb776a-2fb1-4392-a252-4a329b1ae5f6.png align="center")

As you can see their was only one TCP handshake for both GET requests. And, In bottom-left you can also spot `Connection: keep-alive` in GET request headers. Now you know the difference between **HTTP/1.0** and **HTTP/1.1**. You can also test it by yourself; you don't have to trust on theories anymore.

I hope you found this help. In part-2 of this blog post. I'll go more into hacking side where we will explore, "How hacker hack youtube channels?" or "Why cookies were made? to spy on people?" then we might see a practical demonstration too as I always try to do.

Thanks for reading. I hope to see you in part-2.
