Preface

It’s been around 2 weeks since I started reading Beej’s Guide to Network Programming

I’m finding myself going back to previous parts as studying alongside a full time job is hectic.

Therefore, I intend to start jotting down my notes as I read through the document. The purpose of this is just to serve as a quick refresher and is, in no way, a replacement to the actual document.

This is supposed to be an ongoing work and will keep updating it as I progress.

2. What is a socket

Socket: way to speak to other programs using file descriptor.

Where to get this file descriptor?

calling the socket() system routine returns a socket descriptor. It contains specialized send() (sendto() to SOCK_DGRAM) and recv() socket calls to communicate.

Note: Since everything in Unix is a file (including network connections), normal read() and write() can also be used communicate through a socket. But, send() and recv() offer much greater control over data transmision

This doc only covers internet sockets.

2.1 Two Types of Internet sockets

Stream SocketsDatagram Sockets
SOCK_STREAMSOCK_DGRAM
Reliable two way communication stream. Ordering of stream is maintained Upon receiving, they will be error freeAlso called connectionless sockets. No reliability. No order guarantee. If arrived, will be error free
Examples: telnet, HTTPExamples: streaming audio, video conferencing, when few dropped packets don’t hurt no one.
Uses TCPUses UDP

Why SOCK_DGRAM connectionless?

because no need to maintain an open connection as you do with stream sockets. just build a packet, slap an IP header on it with destination information, and send it out.

Why use an unreliable protocol?

For Speed. It’s way faster to fire-and-forget to keep track of what has arrived safely and make sure it’s in order. If you’re sending chat messages, TCP is great; if you’re sending 40 positional updates per second of the players in the world, maybe it doesn’t matter so much if one or two get dropped, then UDP is a good choice.

2.2 Low level Nonsense & Network Theory

Below image to see how a packet is born and then encapsulated/wrapped in headers by different hierarchy of protocols.
alt text for screen readers

OSI Model:

  • Application (where user interacts with network)
  • Presentation
  • Session
  • Transport
  • Network
  • Data Link
  • Physical (hardware eg. ethernet)

Model consistent with Unix

  • Application Layer (telnet, ftp, etc.)
  • Host-to-Host Transport Layer (TCP, UDP)
  • Internet Layer (IP and routing)
  • Network Access Layer (Ethernet, wi-fi, or whatever)

The kernel builds the Transport Layer and Internet Layer on for you and the hardware does the Network Access Layer. Ah, modern technology.

3. IP Addresses, structs, and Data Munging

3.1 IP Addresses, versions 4 and 6

IPv4: 32-bit infornation. limited addresses. 232 addresses. Eg.

192.0.2.111

IPv6: 128-bit infornation. 2128 addresses. Eg.

2001:0db8:c9d2:aee5:73e3:934a:a5ae:9551

examples of compressed IPv6 addresses

2001:0db8:c9d2:0012:0000:0000:0000:0051
2001:db8:c9d2:12::51

2001:0db8:ab00:0000:0000:0000:0000:0000
2001:db8:ab00::

0000:0000:0000:0000:0000:0000:0000:0001
::1

The address ::1 is the loopback address. It always means “this machine I’m running on now”. In IPv4, the loopback address is 127.0.0.1.

3.1.1 Subnets

Format: IP_ADDRESS/SUBNET

For example

IPv4: 192.0.2.12/30 -> 30 is the subnet

IPv6: 2001:db8::/32 or

2001:db8:5413:4028::9db9/64.

3.1.2 Port Numbers

Based on Unix OSI Model mentioned above, Internet Layer (IP) is different from Host-to-Host Transport Layer (TCP,UDP).

IP Address: Required by Internet Layer (IP)

Port Number: Required by Host-to-Host Transport Layer (TCP,UDP)

Different internet services on the internet have different designated port numbers. Some important ones are:

ServicePort Number
HTTP (Web)80
telnet23
SMTP25
GAME DOOM666

All well known port numbers can be found here or in etc/services file in your UNIX system.

Note: Ports under 1024 are often considered special, and usually require special OS privileges to use.

3.2 Byte Order

if you want to represent the two-byte hex number, say b34f

Internet uses,

Big-Endian

i.e. stores it in two sequential bytes b3 followed by 4f

OTOH some computers use,

Little-Endian:

4f followed by b3

Therefore, 2 byte orders are considered in network programming.

Network Byte Order

is always Big-Endian

Host Byte Order

depends on host/device, could be Big-Endian or Little-Endian


We have functions in place to do the above conversation from one byte order to another.

When converting, there are two types of numbers that you can convert: short (two bytes) and long (four bytes)
Say you want to convert a short from Host Byte Order to Network Byte Order. Start with “h” for “host”, follow it with “to”, then “n” for “network”, and “s” for “short”: h-to-n-s, or htons() (read: “Host to Network Short”).

FunctionDescription
htons()host to network short
htonl()host to network long
ntohs()network to host short
ntohl()network to host long

Therefore, before sending bytes to network, they should be in Network Byte Order
When receiving bytes in Host from network, they should be in Host Byte Order

3.3 structs

Covers various data types used by the sockets interface

Socket Descriptor

Just a regular int

addrinfo

  • to prep socket address structures for subsequent use
  • used for host/service name lookups
struct addrinfo {
    int              ai_flags;     // AI_PASSIVE, AI_CANONNAME, etc.
    int              ai_family;    // AF_INET, AF_INET6, AF_UNSPEC
    int              ai_socktype;  // SOCK_STREAM, SOCK_DGRAM
    int              ai_protocol;  // use 0 for "any"
    size_t           ai_addrlen;   // size of ai_addr in bytes
    struct sockaddr *ai_addr;      // struct sockaddr_in or _in6
    char            *ai_canonname; // full canonical hostname

    struct addrinfo *ai_next;      // linked list, next node
};

calling getaddrinfo() returns a pointer to a new linked list of these structures

membersdescription
ai_familyAF_INET = IPv4 AF_INET6 = IPv6 AP_UNSPEC = whatever
ai_nextnext element of the linked list
ai_addrpointer to struct sockaddr

sockaddr

holds socket address information

struct sockaddr {
    unsigned short sa_family; // address family AF_xxx
    char sa_data[14]; // 14 bytes of protocol address
}
membersdescription
sa_familyAF_INET = IPv4 AF_INET6 = IPv6 for the scope of this document
sa_datacontains a destination address and port number for the socket

Its tedious to pack address in sa_data by hand.
To facilitate this, struct sockaddr_in (“in” for “Internet”) was created for IPv4

Important: A pointer to a struct sockaddr_in can be cast to a pointer to a struct sockaddr and vice-versa. So even though connect() wants a struct sockaddr*, you can still use a struct sockaddr_in and cast it at the last minute!

sockaddr_in

    // (IPv4 only--see struct sockaddr_in6 for IPv6)
    
    struct sockaddr_in {
        short int          sin_family;  // Address family, AF_INET
        unsigned short int sin_port;    // Port number
        struct in_addr     sin_addr;    // Internet address
        unsigned char      sin_zero[8]; // Same size as struct sockaddr
    };

Why do this? : makes it easy to reference elements of the socket address

sin_zero : to pad the structure to the length of a struct sockaddr. Should be set to all zeros with the function memset().

sockaddr_in uses struct in_addr

in_addr

// (IPv4 only--see struct in6_addr for IPv6)
    
// Internet address (a structure for historical reasons)
struct in_addr {
    uint32_t s_addr; // that's a 32-bit int (4 bytes)
};

if you have declared ina to be of type struct sockaddr_in, then ina.sin_addr.s_addr references the 4-byte IP address in Network Byte Order

IPv6 has similar struct called sockaddr_in6

sockaddr_storage

designed to be large enough to hold both IPv4 and IPv6 structures.
For times you don’t know in advance if it’s going to fill out your struct sockaddr with an IPv4 or IPv6 address.

struct sockaddr_storage {
    sa_family_t  ss_family;     // address family

    // all this is padding, implementation specific, ignore it:
    char      __ss_pad1[_SS_PAD1SIZE];
    int64_t   __ss_align;
    char      __ss_pad2[_SS_PAD2SIZE];
};

Important bit here is the ss_family

ss_family = AF_INET || AD_INET6

Based on ss_family value, it can be casted to sockaddr_in or sockaddr_in6

3.4 IP Addresses, Part Deux