Preface
It’s been around 2 weeks since I started reading Beej’s Guide to Network Programming
I’m finding myself going back to previous parts as studying alongside a full time job is hectic.
Therefore, I intend to start jotting down my notes as I read through the document. The purpose of this is just to serve as a quick refresher and is, in no way, a replacement to the actual document.
This is supposed to be an ongoing work and will keep updating it as I progress.
2. What is a socket
Socket: way to speak to other programs using file descriptor.
Where to get this file descriptor?
calling the socket()
system routine returns a socket descriptor. It contains specialized send()
(sendto()
to SOCK_DGRAM
) and recv()
socket calls to communicate.
Note: Since everything in Unix is a file (including network connections), normal read()
and write()
can also be used communicate through a socket.
But, send()
and recv()
offer much greater control over data transmision
This doc only covers internet sockets.
2.1 Two Types of Internet sockets
Stream Sockets | Datagram Sockets |
---|---|
SOCK_STREAM | SOCK_DGRAM |
Reliable two way communication stream. Ordering of stream is maintained Upon receiving, they will be error free | Also called connectionless sockets. No reliability. No order guarantee. If arrived, will be error free |
Examples: telnet, HTTP | Examples: streaming audio, video conferencing, when few dropped packets don’t hurt no one. |
Uses TCP | Uses UDP |
Why SOCK_DGRAM
connectionless?
because no need to maintain an open connection as you do with stream sockets. just build a packet, slap an IP header on it with destination information, and send it out.
Why use an unreliable protocol?
For Speed. It’s way faster to fire-and-forget to keep track of what has arrived safely and make sure it’s in order. If you’re sending chat messages, TCP is great; if you’re sending 40 positional updates per second of the players in the world, maybe it doesn’t matter so much if one or two get dropped, then UDP is a good choice.
2.2 Low level Nonsense & Network Theory
Below image to see how a packet is born and then encapsulated/wrapped in headers by different hierarchy of protocols.
OSI Model:
- Application (where user interacts with network)
- Presentation
- Session
- Transport
- Network
- Data Link
- Physical (hardware eg. ethernet)
Model consistent with Unix
- Application Layer (telnet, ftp, etc.)
- Host-to-Host Transport Layer (TCP, UDP)
- Internet Layer (IP and routing)
- Network Access Layer (Ethernet, wi-fi, or whatever)
The kernel builds the Transport Layer and Internet Layer on for you and the hardware does the Network Access Layer. Ah, modern technology.
3. IP Addresses, structs
, and Data Munging
3.1 IP Addresses, versions 4 and 6
IPv4: 32-bit infornation. limited addresses. 232 addresses. Eg.
192.0.2.111
IPv6: 128-bit infornation. 2128 addresses. Eg.
2001:0db8:c9d2:aee5:73e3:934a:a5ae:9551
examples of compressed IPv6 addresses
2001:0db8:c9d2:0012:0000:0000:0000:0051
2001:db8:c9d2:12::51
2001:0db8:ab00:0000:0000:0000:0000:0000
2001:db8:ab00::
0000:0000:0000:0000:0000:0000:0000:0001
::1
The address ::1 is the loopback address. It always means “this machine I’m running on now”. In IPv4, the loopback address is 127.0.0.1.
3.1.1 Subnets
Format: IP_ADDRESS/SUBNET
For example
IPv4: 192.0.2.12/30 -> 30 is the subnet
IPv6: 2001:db8::/32 or
2001:db8:5413:4028::9db9/64.
3.1.2 Port Numbers
Based on Unix OSI Model mentioned above, Internet Layer (IP) is different from Host-to-Host Transport Layer (TCP,UDP).
IP Address: Required by Internet Layer (IP)
Port Number: Required by Host-to-Host Transport Layer (TCP,UDP)
Different internet services on the internet have different designated port numbers. Some important ones are:
Service | Port Number |
---|---|
HTTP (Web) | 80 |
telnet | 23 |
SMTP | 25 |
GAME DOOM | 666 |
All well known port numbers can be found here or in etc/services
file in your UNIX system.
Note: Ports under 1024 are often considered special, and usually require special OS privileges to use.
3.2 Byte Order
if you want to represent the two-byte hex number, say b34f
Internet uses,
Big-Endian
i.e. stores it in two sequential bytes b3 followed by 4f
OTOH some computers use,
Little-Endian:
4f followed by b3
Therefore, 2 byte orders are considered in network programming.
Network Byte Order
is always Big-Endian
Host Byte Order
depends on host/device, could be Big-Endian or Little-Endian
We have functions in place to do the above conversation from one byte order to another.
When converting, there are two types of numbers that you can convert: short (two bytes) and long (four bytes)
Say you want to convert a short from Host Byte Order to Network Byte Order. Start with “h” for “host”, follow it with “to”, then “n” for “network”, and “s” for “short”: h-to-n-s, or htons() (read: “Host to Network Short”).
Function | Description |
---|---|
htons() | host to network short |
htonl() | host to network long |
ntohs() | network to host short |
ntohl() | network to host long |
Therefore, before sending bytes to network, they should be in Network Byte Order
When receiving bytes in Host from network, they should be in Host Byte Order
3.3 structs
Covers various data types used by the sockets interface
Socket Descriptor
Just a regular int
addrinfo
- to prep socket address structures for subsequent use
- used for host/service name lookups
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
calling getaddrinfo()
returns a pointer to a new linked list of these structures
members | description |
---|---|
ai_family | AF_INET = IPv4 AF_INET6 = IPv6 AP_UNSPEC = whatever |
ai_next | next element of the linked list |
ai_addr | pointer to struct sockaddr |
sockaddr
holds socket address information
struct sockaddr {
unsigned short sa_family; // address family AF_xxx
char sa_data[14]; // 14 bytes of protocol address
}
members | description |
---|---|
sa_family | AF_INET = IPv4 AF_INET6 = IPv6 for the scope of this document |
sa_data | contains a destination address and port number for the socket |
Its tedious to pack address in sa_data
by hand.
To facilitate this, struct sockaddr_in
(“in” for “Internet”) was created for IPv4
Important: A pointer to a struct
sockaddr_in
can be cast to a pointer to a structsockaddr
and vice-versa. So even thoughconnect()
wants a structsockaddr*
, you can still use a structsockaddr_in
and cast it at the last minute!
sockaddr_in
// (IPv4 only--see struct sockaddr_in6 for IPv6)
struct sockaddr_in {
short int sin_family; // Address family, AF_INET
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
};
Why do this? : makes it easy to reference elements of the socket address
sin_zero
: to pad the structure to the length of a struct sockaddr
. Should be set to all zeros with the function memset().
sockaddr_in
uses struct in_addr
in_addr
// (IPv4 only--see struct in6_addr for IPv6)
// Internet address (a structure for historical reasons)
struct in_addr {
uint32_t s_addr; // that's a 32-bit int (4 bytes)
};
if you have declared ina
to be of type struct sockaddr_in
, then ina.sin_addr.s_addr
references the 4-byte IP address in Network Byte Order
IPv6 has similar struct called sockaddr_in6
sockaddr_storage
designed to be large enough to hold both IPv4 and IPv6 structures.
For times you don’t know in advance if it’s going to fill out your struct sockaddr with an IPv4 or IPv6 address.
struct sockaddr_storage {
sa_family_t ss_family; // address family
// all this is padding, implementation specific, ignore it:
char __ss_pad1[_SS_PAD1SIZE];
int64_t __ss_align;
char __ss_pad2[_SS_PAD2SIZE];
};
Important bit here is the
ss_family
ss_family = AF_INET || AD_INET6
Based on ss_family
value, it can be casted to sockaddr_in
or sockaddr_in6