Skip to main content

Command Palette

Search for a command to run...

A Comprehensive Guide to the Web's Core Protocol, HTTP for Hacker & Developers

Updated
12 min read
A Comprehensive Guide to the Web's Core Protocol, HTTP for Hacker & Developers
F
I’m FlareXes. I work around Linux, networking, security, and internet infrastructure as part of my job. University was mostly trash, honestly. The people were good. The content wasn’t. Staying didn’t make sense, so I left.

Overview of HTTP

Hypertext Transfer Protocol (HTTP) is an application-layer protocol or layer 7 protocol on OSI model. It was introduced in 1991 by Sir Tim Berners-Lee, British computer scientist who also created the World Wide Web (WWW). It was designed for transmitting hypermedia documents over the internet, such as HTML. HTTP protocol defines set of rules for how data should be formatted and transmitted between the client and server. This ensures reliable and standardized communication. A crucial point to understand that, HTTP is a stateless protocol that means each request from a client to a server is independent and unrelated to any previous requests because of that the server does not keep any data (state) between two requests. More on that in cookies section.

Client-Server Architecture

HTTP protocol follows Client-Server architecture, with a client (e.g., web browser) opening a connection to make a request, then waiting until it receives a response from server or request get timeout. HTTP operates on Transmission Control Protocol (TCP) which ensure reliable, ordered, error-checked delivery and completeness of data transmission.

  • Client:

    • Web Browsers, Programming Languages, or Many App That Can Make HTTP Request such as, Curl.
  • Server:

    • Well-Known HTTP Web Servers e.g. - Nginx, IIS, NodeJS, Apache Tomcat etc.

Importance of Understanding HTTP

HTTP is a core foundation of today's technological world. As a programmer and hacker, HTTP protocol gives you understanding of writing better web-based applications and secure applications too. It gives you a distinct view of looking a website and debug them.

Let's understand with an example, Do you know that you can access different resources or perform different actions if I make a request to same URI like flarexes.com/action. You may be thing but how? Different response from same request URI? This is possible via various ways but common one is HTTP Methods. I will cover them later. But sometimes programmers don't restrict allowed HTTP methods and this become a dangling point where a hacker you can potentially find a enter point.

Having basic understanding of HTTP is invaluable for Web Development, API Development, Performance Optimization, Security, Troubleshooting and Debugging.

Fundamentals of HTTP

One good thing about this protocol is, it's super simple to understand. I recommend you going through it's RFCs when you have time. It's just a formatted text that has been standardized.

From client-server architecture section, we know that HTTP has two main components. First, Request that client makes and second, response that server sends. So, Let's understand both of them. We'll also take a seek-peek of "how this standardized format looks like?".

HTTP Request

An HTTP request is made by a client to server asking to access a resource on the server.

A correctly composed HTTP request contains the following elements:

  1. Method

    • HTTP supports a set of methods, which defines the action to be performed on a given resource. e.g. - GET, POST, DELETE, PUT, OPTIONS.
  2. Request-URI

    • A Request-URI locates to an existing resource on the internet, such as HTML or Image. e.g. - /path/images/cat.png.
  3. HTTP Version

    • HTTP version indicates server which version of HTTP client is using. This wasn't mandatory in HTTP/0.9 and HTTP/1.0. It is important to note it allow server to respond appropriately otherwise it might cause issues because newer versions of HTTP has features that previous once don't support.
  4. Headers

    • Headers let the client and server transmit additional information with an HTTP request or response. There are many standard HTTP headers defined in HTTP specification. Like Host which is a mandatory field except for HTTP/0.9. But client and server also add custom headers like x-token or anything meaningful.
  5. Blank Line

    • To indicate end of headers (if there is a body, an optional HTTP specification).

Example of simplest possible HTTP request:

GET /resourse HTTP/1.1
Host: api.example.com

HTTP Response

An HTTP response is made by the server to client in response to it's request. A response contains the requested resource with other information which client can interpret.

A common composed HTTP response contains the following elements:

  1. HTTP Version

    • Same as HTTP request from client, server also send HTTP version to client indicating which version of HTTP server is using. To avoid any conflicts in future.
  2. Status Code

    • A status code is a three-digit number indicating the result type of the response, was it success, fail or unauthorized.
  3. Status Text

  4. HTTP Header

    • HTTP response header contain information that client can use to learn more about the response type and server. Below is an example of HTTP response headers which tells client, when response was sent, it was sent by GWS and that was a JPEG image:
Date: Sat, 08 Jun 2024 12:07:48 GMT
Server: gws
Content-type: image/jpg
  1. Body

    • The message body in an HTTP response contain either the requested resource or additional information about the status of the action requested by the client. From out above example on headers, this could be an image.

    • Most responses have body except when client has used HEAD Method. Which request server to response headers without body. This come handle while debugging unexpected results.

Hands-on Practical Lab

There are hundreds of softwares, tools and utilities that allow users to send and receive HTTP headers. So, I'll first list few of my favourite once but later on I will be going with curl. Curl is wide-adopted command-line utility for interacting with HTTP and other protocols too. It's comes pre-installed in major UNIX/LINUX based operating system and can be installed in windows too.

  1. Curl: Simple, easy, gets the job done.

  2. Python: Complete control over every aspect of the HTTP protocol.

  3. Postman: An another powerful API testing client known for its rich feature set.

  4. HTTPie: API testing client with cli & gui versions. You can also try httpie online from HTTPie for Web.

  5. Burp Suite: Hacker's favourite security testing software with extensive features including custom HTTP request capabilities.

Show Request and Response Headers

Just looking at output below, you can tell alot of things. Client is requesting / or root of Host via GET method. Also request is contracted from curl. Server respond with the same version of HTTP that client used HTTP/1.1 with status code 301 which means requested resource is moved to Location: http://www.google.com/.

It's an interesting observation that you won't see while browsing google.com in web-browser. Google actually doesn't host anything on google.com, instead they redirect you to www.google.com.

$ curl -v google.com

> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Server: gws
< Content-Length: 219

Show Request and Response Headers with Redirects

Below command follow all redirects till it keeping getting Location header in server's HTTP Response.

$ curl -L -v google.com

> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
<
> GET / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 08 Jun 2024 10:57:17 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1

Send Request From Specific HTTP Version

Keeping in mind not all websites supports ever HTTP versions. Let's explore the fact, in HTTP transmission both client and server use same HTTP version if they can.

# HTTP/0.9 Version 
$ curl --head --http0.9 https://vercel.com

HTTP/2 200
server: Vercel

As you see, when I make a HTTP GET request from HTTP/0.9 version, server replied with HTTP/2 that show vercel doesn't support HTTP/0.9 version. Curl's default request method is GET and --head means only show HTTP response header.

Now, Try these commands and check HTTP version vercel replys with. It would be same as request.

# HTTP/1.1 Version 
$ curl --head --http1.1 https://vercel.com

# HTTP/2 Version 
$ curl --head --http2 https://vercel.com

Evolution of HTTP Versions

As far we have got the basic understanding of HTTP. However it's also important to know how it came so far. From a one line of protocol to spreading all over the internet, starting with HTTP/0.9.

HTTP/0.9 – The One Line Protocol

HTTP/0.9 was the simplest version of HTTP protocol. The request consists only possible method GET, followed by the path to the desired resource. There were no HTTP headers. This meant server can only respond with HTML files. After server every response connection get terminated.

GET /mypage.html

Key Points:

  • Request Format: GET Method + Path of the resource

  • Response Content: Limited to hypertext files

  • Supported Methods: Only GET

  • Connection Nature: Terminated once server dispatch response

  • Limitation: No Headers, No Status Codes, No URLs, No Multimedia Files

HTTP/1.0 - The Building Block

RFC 1945 Outlined The HTTP 1.0 Protocol

To overcome the limitation of HTTP/0.9, a newer version HTTP/1.0 was introduced after few years. Established connections in HTTP/1.0 are short-lived. That means, a new connection created for each request and closed once response had been received. Therefore HTTP/1.0 is Non-Persistent Connections.

This was okay till modern web-pages started to get complex and required many requests to serve the page. HTTP/0.9 and HTTP/1.0 both are short-lived connections and Non-Persistent Connections. It lead to a different problem of Three-Way Handshake. HTTP is based on TCP that means a TCP handshake happens before each HTTP request and this in itself was time-consuming and resources heavy especially in 90's.

Example of HTTP/1.0 Request :

GET /index.html HTTP/1.0 
Host: www.example.com 
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/xml,image/jpeg

Example of HTTP/1.0 Response :

HTTP/1.0 200 OK
Date: Mon, 10 Jun 2024 12:00:00 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
Content-Type: text/html

<html>
A page with an image
  <img SRC="/image.jpg">
</html>

Followed second connection will also fetch image.

GET /image.jpg HTTP/1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
HTTP/1.0 200 OK
Date: Mon, 10 Jun 2024 12:00:04 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
Content-Type: text/jpeg
(image content)

Key Points:

  • Request: Rich Metadata, Headers, HTTP Version, Status Code, Multimedia etc

  • Response: Not limited to hypertext

  • Methods: GET , HEAD , POST

  • Connection Nature: Short-Lived

  • Limitations: CPU Overhead, Buffering and Redundant Requests (handshake for each request)

HTTP/1.1 - The Standardized Protocol

RFC 2068 Outlined The HTTP 1.1 Protocol

The first standardized HTTP protocol, HTTP/1.1 was released in 1997. Since then it had been gone through two revisions, first in 1999 and second one in 2014. These changes were defined in RFC 2616 and RFC 7230 to RFC 7235. But the first official HTTP/1.1 standard is defined in RFC 2068, which was officially released in January 1997, roughly seven months after the publication of HTTP/1.0. It's been over two decades still HTTP/1.1 is most used HTTP version.

HTTP/1.1 introduced numerous improvement and overcame issues such as non-persistence connection and TCP handshake overhead. A Keep-Alive header was added to HTTP/1.1 specification. This header tells the server to not to close the connection. Therefore HTTP/1.1 is Persistent Connections. Persistent HTTP connections are designed to allow multiple request/response exchange without establishing a new connection every time.

HTTP/1.1 also introduced Caching mechanisms and Pipelining. HTTP pipelining is a technique that allows multiple HTTP requests to be sent over a single TCP connection without waiting for each response before sending the next request.

Key Points:

  • Methods: GET , HEAD , POST , PUT , DELETE , TRACE , OPTIONS

  • Connection Nature: Long-Lived

  • Attack Surface: Prone to DOS Attack

  • New Features: Pipelining , Caching , Persisted Connection

  • Advantages: - Reduced Network Congestion, Chucked Transfer, Lower Resources Usage

HTTP/2.0 and HTTP/3.0 (HTTP Over QUIC)

I will skip over from these two protocols. They both were designed to improve the performance of complex web-applications such as YouTube. HTTP/3 is still experimental but used by many web-applications. Story of these protocols is fascinating too but nothing much add here except the part now HTTP is moving to UDP With Congestion Control instead of TCP. You can easily spot QUIC protocol in Wireshark capture on youtube. However, you can read about them from reference section. And If you want me to cover it comment below or on my socials, I'll update the article.

HTTP/1.1 and HTTP/2 performance demonstration.

Keep-Alive: Hands-on Practical Lab

To see how HTTP/1.1 handles network congestion with Keep-Alive header; we need two things.

  1. Active HTTP/1.0 and HTTP/1.1 servers to compare the difference

  2. Wireshark to capture and analyze traffic

Analyzing Request-Response Behavior in HTTP/1.0

Step 1: Create an html file named index.html and save an image under same directory named ./4k.jpg.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h1>This is me</h1>
    <img src="./4k.jpg" alt="">
</body>
</html>

Step 2: Simplest way to start an HTTP/1.0 server is using Python.

python -m http.server 8000 --protocol=http/1.0

This command will spin up a simple http server on port 8000 with specified version in this case HTTP/1.0.

Step 3: Verify server is responding in HTTP/1.0.

curl --head http://0.0.0.0:8000/

http/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.12.3

Step 4: Spin up Wireshark and start capturing traffic on localhost interface Loopback: lo. Then visit the website.

I redacted the unnecessary info but in boxes as you see. Their is a TCP handshake before each request. Now, In next section we will see how does HTTP/1.1 react to this situation.

Analyzing Request-Response Behavior in HTTP/1.1

Step 1: Let's restart http server again on same directory with same code but on HTTP/1.1 version.

python -m http.server 8000 --protocol=http/1.1

Step 2: Verify server is responding in HTTP/1.1.

curl --head http://0.0.0.0:8000/

http/1.1 200 OK
Server: SimpleHTTP/0.6 Python/3.12.3

Step 4: Spin up Wireshark and start capturing traffic on localhost interface Loopback: lo. Then visit the website.

As you can see their was only one TCP handshake for both GET requests. And, In bottom-left you can also spot Connection: keep-alive in GET request headers. Now you know the difference between HTTP/1.0 and HTTP/1.1. You can also test it by yourself; you don't have to trust on theories anymore.

I hope you found this help. In part-2 of this blog post. I'll go more into hacking side where we will explore, "How hacker hack youtube channels?" or "Why cookies were made? to spy on people?" then we might see a practical demonstration too as I always try to do.

Thanks for reading. I hope to see you in part-2.