An Introduction to Hex Editing for Cybercrime Investigators

Image for post
Image for post
Photo by freestocks on Unsplash

People interested in entering the growing field of cybercrime investigations can gain a leg up on the competition by learning how to use a hex editor. To “hex edit” means to make changes to the raw binary data — 1’s and 0’s — on a computer. “Hex” is short for “hexadecimal,” something I will discuss shortly. A hex editor is an application that presents the raw data of a file and allows the user to edit that data. This article will describe hexadecimal notation, the process of hex editing, and give some examples of how hex editing can be used by cybercrime investigators.

Hexadecimal notation can be a little disorienting for the uninitiated. It is based on multiples of 16, instead of 10. We are most familiar with a number system based on multiples of 10 — the decimal system. In a decimal system, we start with 0, 1, 2…up to 9, and then we add another digit and start over — so 10, 11, up to 19, and so on. Once you get to 99, you add another number and start over with 100.

A hexadecimal system starts with 0, and we count 15 spots past 9 (we do not use 10 because that would be adding another digit) by adding A, B, C, D, E, and F. In other words, A = 10, B = 11, C = 12, D = 13, E = 14, and F = 15. A new digit is added after F, and we start over with 10, 11, …1F.

So, why use this weird system?

Computers store information in another numbering system based on “1” s and “0” s. This is a base two system, and the values are written in binary notation. If you visualized what a computer sees, it could be very disorienting for a human’s eyes. Consider the phrase I hope the Dodgers win the World Series. This phrase would be rendered in binary notation as:

01001001 00100000 01101000 01101111 01110000 01100101 00100000 01110100 01101000 01100101 00100000 01000100 01101111 01100100 01100111 01100101 01110010 01110011 00100000 01110111 01101001 01101110 00100000 01110100 01101000 01100101 00100000 01010111 01101111 01110010 01101100 01100100 00100000 01010011 01100101 01110010 01101001 01100101 01110011

The convention is to group 8 digits together — as I have done above. This is because each digit is a bit, and 8 bits is a byte. Each 8-digit block is a byte stored on a computer. A byte is the basis of the terminology people use when describing the size of a file or how much storage they have on a computer. For example, most new desktops have a hard drive that has at least 500 GBs of storage. 500 GBs is 536,870,912,000 bytes on a computer.

Hexadecimal notation helps the human in two ways. First, it is not as confusing as a series of 1s and 0s. It is harder for a human to identify important values when the eye sees only two values repeatedly. Second, because it is base 16 and not base 2, the same information can be shown in only a fourth of the space. That same phrase — I hope the Dodgers win the World Series — would be rendered in hexadecimal notation as:

4920 686F 7065 2074 6865 2044 6F64 6765 7273 2077 696E 2074 6865 2057 6F72 6C64 2053 6572 6965 73

Hex notation is either written in blocks of 2 (which would equal 1 byte), or sometimes in blocks of 4 (which equals 2 bytes, or a word).

Image for post
Image for post

There is another benefit, though not as useful for computer investigators. If one wanted to convert from hexadecimal to binary, it is relatively easy.

Consider the first “word,” or 2 bytes of the hex code above: 4920. We can use Table 1 and match the hex digit with the corresponding binary digit.

Image for post
Image for post

We can see a similar conversion here in the graphic below:

Image for post
Image for post

Converting between binary and hexadecimal is of more interest to computer scientists or programmers because computers store information ultimately in bits. However, having a basic understanding the relationship between what is visualized on a hex editor and what the visualization means is important for anyone in a cyber field.

There are at least four areas of focus on any hex editor:

  • The address area (usually on the left) displays the position of the data in the file.
  • The hexadecimal area in the center shows the raw data in the file
  • The character area (usually on the right) shows the characters that may correspond to the information
  • There will also be a file information area showing metadata about the file, and information about the raw data within the file

You can see these elements in the screenshot below from the popular hex editor HxD. HxD places the file information pane (called the “Data Inspector”) on the far right.

Image for post
Image for post
The Four Main Areas of a Hex Editor

You can use your mouse to navigate to a specific place in the file and click to place the cursor at the place. The cursor is called a caret, and the place where you clicked is called the offset. In the figure above, you will see that the caret is at offset 44. This is the 44th byte in the file. You can see the offset number at the bottom of the screen. You can also determine the offset by looking at the row (it is row 40) and the column (it is column 4).

Although one some occasions one byte is all one is interested in, it is usually the case that a block of text conveys important information. In the figure below, a block beginning at offset 44 is highlighted. We can see, looking at the bottom of the application that the address of this block is from 44–4D.

Image for post
Image for post
A Highlighted Address Block in a Hex Editor

Computer programmers and software engineers may need to use a hex editor for debugging or editing a file, among other things. Hackers can embed code in certain regions of a file, without damaging the usability or functionality of the file (thereby hiding their malware). However, cybercrime investigators will have other uses for a hex editor.

Bytes at the beginning and end of a file are set aside for specific information and metadata. For example, the first several bytes in a file will determine what type of file it is — a word document, a jpg image file, an executable file, and so on. This is its file signature. File signatures are common values, and there are listings of these file signatures in several places online. Wikipedia has a listing of file signatures and at what offset they can be found in a file. Cybersecurity expert Gary Kessler maintains a file signature database that is user friendly (Figure 4).

Consider an ODT file, the file extension for an open text document. This is a document produced by the word processing application Libreoffice. An ODT file has 50 4B 03 04 as the first 4 bytes in its file (Figure 4).

This is important for people in digital forensics because cybercriminals will change the extension of a file to hide that file. For example, they may want to hide a word document containing stolen passwords by changing the “DOC” extension to “PNG.” Most forensic software will identify file mismatches (when a file extension has changed) automatically. However, the investigator will need to look at the raw data to understand what that file actually is, and then edit the raw data so that it is readable by the appropriate application.

Image for post
Image for post
The File Signature of an ODT File —

Another use of hex editing is recovering deleted files from a hard drive. Sometimes deleted data can be recovered if the operating system has not overwritten it. As you know, deleting a file or moving it to trash does not erase the file, but tells the computer that the space taken up by the file (the literal bits on the disk drive) is free to be used by new data. This is called unallocated space. A forensic investigator can use a hex editor to find the entire file or find fragments of the file. The process of piecing together a file in this way is called file carving. You can search in the unallocated space of a drive, look for the file header (file signature) and the file footer. You can then extract the header, the footer, and the contents in between! That should be file.

A second use of hex editing is identifying time stamps. Timestamps are records of when something happened. For an investigator attempting to establish when a suspect accessed a file, this is essential. A suspect may attempt to change the date modified on a file to suggest they did not access it during a specified time: “The file’s last access date was 2:00 PM on 8/25/2019, but witnesses saw me surfing at around that time — it must have been someone else!”

However, the operating system records the true date and time of when a file was accessed. This value is stored at a predetermined offset in the file. Using a hex editor, a computer investigator can find and interpret that timestamp.

A third use is identifying malware embedded in a file. Hackers can embed code in certain areas of a file without damaging the usability or functionality of the file (thereby hiding their malware). A common practice is to embed malicious code in a document. The target opens the document and the malware spreads on their computer. Cybercrime investigators can use a hex editor to identify this malware.

Those uses of a hex editor are exciting but beyond the purview of this essay. Here are some resources for if one wants to explore further:

Hex editing at first appears to be a complex activity. However, it is easier than it first looks. Identifying file signatures, file carving, identifying time stamps, and more is made simpler through the assistance of the computer investigation community. Online resources can be found that tell the investigator where a given piece of data is expected to be found — a timestamp or a file signature, and what values one should expect. So once a person knows what to look for, it is a straightforward activity. The key is to embed yourself in that community so that it is easier to find those resources. One well-known organization is the International Society of Computer Forensic Examiners (ISFCE). Another, aimed at primarily at law enforcement, is the International Association of Computer Investigative Specialists (IACIS).

Hex editing is also made easier by the fact that computers, as of yet, do not have the human capacity to modify its behavior. Therefore, once you learn a process, it does not change. File carving is a process that will be the same as long as you are working with the same file storage system. Once you have learned it — you got it! This is in stark contrast to the criminals using these computers, as they are always inventing new types of criminal behavior facilitated by computer technology. This is frustrating, but will keep computer investigators in a job for a long time!

Written by

Rod is an Associate Professor of Sociology at Old Dominion University.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store