logoalt Hacker News

Your hex editor should color-code bytes

580 pointsby tobrlast Tuesday at 9:52 AM152 commentsview on HN

Comments

dspillettyesterday at 9:43 AM

Everything should try do some basic syntax highlighting IMO. Not too much, or it just becomes a sea of formatting that doesn't help at all. It is surprising how much difference just a little splash of colour can make if it isn't overdone. If possible, always include configuration options for the user though, so those with colour-blindness issues can tweak things to their needs, those who are just fussy can make the output fit with their finely adjusted system-wide colour schemes¹, and even better, where you can, allow bold/italic/other as well as colours so that those who barely see colour at all can play too.

Of course none of this helps those using screen-readers and other tech, so make sure that all your fancy colouring & such is additive so if it is all “lost” no meaning is absolutely lost with it.

--------

[1] Some people can be very vocal about this, more so than if highlighting isn't possible at all. If you give any output formatting they'll expect you to match, or be able to be made to match, their preferred style.

show 3 replies
cuechanyesterday at 10:57 AM

For anyone who regularly has to look at/analyze binary files, i highly recommend ImHex [1].

Its a hex editor built with imgui and has a lot of built in tools. Imo the best feature is the data structure editor. You can write a data type definition similar to C and it overlays it on the hexdump and parses it in a structured way while you type.

It also has a node based editor.

1: https://github.com/WerWolv/ImHex

show 7 replies
roelschroevenyesterday at 11:03 AM

When you're going to color-code bytes in a hex dump, I would expect each ASCII character in the right column to have the same color as the hex byte in the left column, making it easier to pair them. I wonder why that wasn't done here.

show 1 reply
bwiggsyesterday at 1:18 PM

DEFCON30, Mayhem CTF.

We were given a file full of random bytes. The flag was in there somewhere. It was too random to be encrypted, there wasn't any structure. `file` didn't return anything, truly just a bag of bytes.

I had decided to install `hexyl` as an alternative option to some of the other hex editors installed o my linux machine. All the bytes were colored grey.

I scrolled the file and noticed a blip of yellow. A random golden `{` amongst all the noise. Weird.

The next colored byte was a `C`, then `T`, `F`.

---

At that time, I was mostly using HexFiend to look at raw files, which didn't have byte coloring. For DEFCON I had decided to drive my linux machine. I had ghex installed, but i had also decided to install and try `hexyl` via cli. So seeing bytes in color was purely by chance that I had installed it. I eventually posted an issue to ghex to add color support. https://gitlab.gnome.org/GNOME/ghex/-/issues/60

I need to see if I can find the file and post it on that blog post. https://bwiggs.com/posts/2023-08-31-hacking-in-color/

show 3 replies
NooneAtAll3yesterday at 10:54 AM

Why did author decide that best way to demonstrate his idea would be by cutting contrast in half?

color-coding might be a great solution, but you don't really know beforehand which byte values are important. Manually selecting C0 to make it stand out it just ctrl+f with extra steps. (But I wouldn't mind something like "color 00 separate from ascii separate from the rest)

show 2 replies
myfonjyesterday at 1:14 PM

When (rarely) using hex editors, one thing constantly comes to my mind: isn't base 16 arabic-roman numerals a bit awkward for "skimmable" overview? Color-coding indeed helps immensely there, but wouldn't simply letting bits and bops shine in eight bit clusters, resembling the "physical" shape of the eight-bit byte, be somewhat more readable?

We even have characters in the Unicode for representing 0..255 variations, actually two distinct groups: Braille (arguably a bit misuse for binary) and octants (accompanied by older predecessors). So what would be

    |65|97|66|98|67|99|32|126|32|72|101|108|108|111|44|32|109|111|109|33|32|240|159|166|132|
in base-10 or

    |41|61|42|62|43|63|20|7e|20|48|65|6c|6c|6f|2c|20|6d|6f|6d|21|20|f0|9f|a6|84|
in base-16, could be

    |⢈|⢊|⡈|⡊|⣈|⣊|⠂|⡾|⠂|⠌|⢪|⠮|⠮|⣮|⠦|⠂|⢮|⣮|⢮|⢂|⠂|⠛|⣵|⡣|⠡|
in Braille, or

    |𜵲|𜵶|𜴷|𜴻|𜶭|𜶱|𜴀|𜵯|𜴀|𜴋|𜶔|𜴭|𜴭|𜷟|𜴫|𜴀|𜶢|𜷟|𜶢|𜵴|𜴀|(⁕)|𜷢|𜵖|𜴙|
using octants.

Most significant bit is at the top left here, the least one is bottom right -- it felt somewhat intuitive to me this way, your intuition may differ, obviously.

Or, naturally, "AaBbCc ~ Hello, mom! <Unicorn Emoji>" as a "UTF-8" text.

Try: http://myfonj.github.io/tst/byte-dec-hex-braille-octant.html) Test (with added "CSS" variant and "highlight" of empty dots): http://myfonj.github.io/tst/byte-visualisation-exploration.h...

(⁕) HN apparently eats upper-half block. Amusing that only this particular ("old", as referred earlier) one got filtered out…

Also caveat: Android phones have messed-up Braille block due outdated broken embedded font, so all patterns with dots in the left half appear in the right instead. Long reported, not fixed, IIRC.

show 1 reply
jcalvinowensyesterday at 4:24 PM

If you just want to see patterns, and don't actually need to see the values, you can go a step further and simply visualize the data as a bitmap, e.g.

    dd if=/dev/urandom bs=$[256*256] count=1 | display -size 256x256 -depth 8 GRAY:-
You can do the same thing with audio, which makes different sorts of patterns obvious, e.g.

    dd if=/dev/urandom bs=$[256*256] count=1 | aplay -c 1
show 2 replies
Someoneyesterday at 12:17 PM

The first example is “go ahead, try to find the single C0 in these bytes” and then argues one should highlight C0 bytes.

If that’s true, how does the tool know I will be looking for C0 bytes and not for 03, D3, etc? The logical conclusion of that would be that the hex editor should uniquely color code every byte. And following the other examples even that’s not enough.

The proposed solution is to create groups of byte values that each get their unique color. I think that helps, but we can do better: add a search feature. That tells your editor what you are looking for. Once you enter a search string, it can highlight all hits.

Yes, “colorful output in a hexdump is useful for the same reason that syntax highlighting for code is useful”, but do you know what syntax highlighting needs? Knowledge of the expected content of a file. Without that, a hex editor at best can guess at how to color-code stuff.

IMO, if you want to add syntax coloring to a hex editor, give it pluggable syntax coloring and heuristics for deciding which one to use when.

While at it, also let those plugins control where to break lines, whether to show hex at all (why show it at all if a file has a few paragraphs of English text or an array of IEEE doubles?), etc.

Those plug-ins will make errors and sometimes, users will want to see all byte values, so you’ll need a way for the user to override them.

show 1 reply
delta_p_delta_xyesterday at 10:30 AM

  > Your hex editor should colour-code bytes so it is easier for users to distinguish patterns
  > Article is fully in lowercase, which makes it harder for readers to make out sentences and the flow of the article
  > mfw the irony
show 1 reply
bandramiyesterday at 8:31 AM

Emacs's hexl-mode does this, incidentally, though annoyingly by default it makes all faces the same color. I never understood why it defines the faces but then doesn't customize them.

show 1 reply
skalidindi3yesterday at 11:59 PM

https://github.com/skalidindi3/cxxd

I wrote this a LONG while back. At the time, I was fresh out of college and my first job included a lot of reverse engineering communication protocols to make machines work together for automation purposes. I will personally testify to how useful it was to see visual patterns to aid in this. The single biggest benefit was seeing that one particular protocol switched endianness WITHIN a specific packet.

ChrisRRyesterday at 11:35 AM

What a bad way to illustrate your point by using such similar looking pastel colours

show 1 reply
kokakiwiyesterday at 10:58 AM

ImHex (https://imhex.werwolv.net/) is also a really nice Hex editor with tons of plugins (patterns, file support, etc.) and even an embedded language for adding more patterns easily

show 1 reply
orpheayesterday at 11:50 AM

I get the idea but those specific examples are awful - not enough contrast.

nticompassyesterday at 11:23 AM

I used to use wxHexEditor and that had a feature where I could select a section of the file and highlight it in a color. When I was working to decode a certain file format, I used that to color-code different sections of the file and it was super useful. Those color-codes were stored in a separate file so you could load them back in.

Archelaosyesterday at 9:24 AM

This article made me think how I could use similar techinques to colour code the data in database tables. Has anyone here tried that and has some recommendations where to start, etc.?

show 1 reply
js8yesterday at 8:32 AM

I think semantic coloring (based on structure) is more useful. Also (can't help as someone working with z/OS), if you really want to make hex output readable, I recommend using big-endian machine.

psychoslaveyesterday at 9:01 AM

That said, even colored these dumps still feels unappealing to me — so yes this is admittedly subjective gut jumping in the conversation. I get that occult form can also be an attractive force.

The post put on the table an interesting point about how to improve the presentation layer to fit what’s human cognition is good at spotting (in general, or at least for the expected audience with some training). And it does start proposing something with these color schemes. But isn’t it kind of missing the forest for the tree? Actually why do we even have rendering with [012345678ABCDEF], when a specific set of (colored/imaged?) glyphs would be able to make more obvious what’s on the table? Or even beyond the hexadecimal grouping, wouldn’t be more relevant to render something "intuitively" far more easy to grap without several layer of internalized interpretation through acculturation?

show 2 replies
evikstoday at 3:42 AM

And the lon leading zero should be a space instead, will cut down on visual noise in the examples in the post a lot to let your color codes do the rest

randusernameyesterday at 1:11 PM

I think this is a cool idea.

I'd want to take it further by using full RGB and cycling through some colormaps with different properties. Sequential, diverging, cyclic like in matplotlib.

https://matplotlib.org/stable/users/explain/colors/colormaps...

Can't think of a specific use-case off the top of my head, but sometimes I just want the "feel" of the data when I'm plotting something, and maybe the same scattershot approach would pay off at some point on unknown hex data if it was an option.

sidewndr46yesterday at 1:52 PM

compare a simple high contrast display to one of that makes it difficult to read and hurts my eyes? Sure! absolutely!

I'll pass thank you

nickwanningeryesterday at 3:58 PM

I added type-based color printing to my hexdump in my kernel [1] if anyone wants to have that code. It was instrumental in finding bugs quickly in wee hours of the night sometimes, especially if you have heap corruption. ---- [1] https://github.com/ChariotOS/chariot/blob/e046849c668458d25e...

red_admiralyesterday at 11:00 AM

My hex editor should let me turn syntax highlighting on and off; follow my personal color theme (and not produce light gray on white in the terminal); and let me highlight specific things I'm searching for like OD OA or FF FE.

xvilkayesterday at 4:02 PM

Rizin[1][2] does exactly this, also there is a compact hex-II[3] mode.

[1] https://rizin.re

[2] https://github.com/rizinorg/rizin

[3] https://speakerdeck.com/ange/no-more-dumb-hex

dhosekyesterday at 3:49 PM

I found the coloring in most of the examples to be more distracting than helpful. I can see cases where it could be helpful (e.g., highlighting bytes in the 0x20–0x7E range for spotting ASCII strings, or a fancier one that can identify UTF-8 strings, or better still, invalid sequences in what might otherwise be UTF-8), but most of the cases here didn’t really help all that much for me.

Xophmeisteryesterday at 11:11 PM

If you have a Kaitai spec for the format, then its IDE[0] will do structural highlighting for you. This is web-based, but it doesn’t seem like too much of a stretch to implement a TTY version.

[1]: ide.kaitai.io

Findecanoryesterday at 7:39 PM

If you're making a hex editor and going to have colour coding, I'd think you expend some effort to make the colouring schemes configurable — and easy to configure and change. Maybe load and save as separate files.

Different colouring schemes for different types of data.

soegaardyesterday at 1:52 PM

This was a great article and inspired me to add support for binary files in `peek`.

https://soegaard.github.io/peek/#%28part._binary-files%29

For me the key insight is that similar values should get similar colors. And since Fx and 0x are "similar" the color palette should be cyclic.

fleebeeyesterday at 12:33 PM

> having more colors makes it possible to recognize more complex patterns

The implicit cost here is that the simple patterns become harder to recognize when every byte is only subtly differently colored. Rather than give everything a different color, I'd rather have the important stuff highlighted.

In the comparisons given, I think hexyl's highlighting scheme is significantly more useful.

ape4yesterday at 6:27 PM

As a next level beyond coloring... how about adding some interactivity? How about a slider to control the brightness of each type of byte. Turn everything but text to 10% when when you're looking for some words in a binary file.

taericyesterday at 2:31 PM

I find it funny that I found the "single C0" pretty much instantly.

I grant that the post largely has a point, mind you. But scanning for a needle in a haystack is something that you just don't often do?

I am, of course, now very curious how often folks are using hex editors. And itching for an excuse to open a file that way. :D

MisterTeayesterday at 2:15 PM

> compare that to one with colors:

The colors make it worse as I'm red-green colorblind. Looking at that mess is eye strain.

Honestly I mostly prefer syntax highlighting turned off as it causes eye strain. I have found the black on light yellow theme of the Acme editor to be a very comfortable monochrome color scheme.

deepsuntoday at 1:43 AM

Rob Pike says colors add cognitive load. He calls syntax highlighting "juvenile".

show 1 reply
whizzteryesterday at 12:55 PM

Anyone tried using Kaitai descriptions? It seems like a fairly flexible system that would be an excellent starting point for a hex-editor that wants to add good higher level coloring (and perhaps even editing).

stronglikedanyesterday at 2:21 PM

> go ahead, try to find the single C0 in these bytes:

Ctrl+C, Ctrl+F, Ctrl_V... Easy!

show 1 reply
0xfalafelyesterday at 2:47 PM

[dead]

PunchyHamsteryesterday at 11:46 AM

I wonder how hard it would be to color code repeating sequences

ameliusyesterday at 10:27 PM

Or maybe use an LLM to auto-format your hex dumps based on structure?

azalemethyesterday at 9:01 AM

I really like hexyl [1], which does this by default.

https://github.com/sharkdp/hexyl

show 1 reply
vesuiyesterday at 9:44 PM

Fantastic writeup, and it's so cool to see you at the front of HN! This article compels me to add colored syntax highlighting for hex code blocks on my own blog...

asibahiyesterday at 8:52 AM

When I read this article a few days ago it inspired me to create my own hex viewer : https://ar-ms.me/thoughts/3sl-a-sweet-hex-utility/

The cool thing about it imo (outside of colors) is a `--windows` flag. Which separates the hex view into partitions: so `-w 2:-3:5` shows the first two bytes on a line, then skips three bytes, then shows the next 5 bytes on a line, then the rest of the file. Easy to use combined with a terminal's up arrow.

greatgibyesterday at 10:13 AM

To me the random colors at each byte is messing up with my brain making it hard to fast identify C0 or any other value that I could more easily identify in all black.

But color would be nice more based on the bytes logic.

Eventually the 00 in a shaded grey instead of black, and in best case scenario by logic unit based on your protocol. And worst case scenario by groups of words or so.

xyx0826yesterday at 9:30 AM

If you analyze binary files often, I highly recommend binvis - http://binvis.io/. It creates a colored minimap for files it loads and has two available arrangements. Pixel color is based on range of bytes, eg ASCII/null bytes/FF bytes. Besides, it’s a pretty basic hex viewer that runs in your browser. The minimap is extremely powerful for identifying interesting areas and patterns in unknown data.

show 1 reply
leetroutyesterday at 8:23 PM

ipython saved me this week when it color coded

  b'\x100'

Which was not obvious to me in the print output that it is \x10 and a literal 0.
adv_zxyyesterday at 10:18 AM

radare2 also has excellent hex viewing/editing support, if one manages to grok the usage of it.

show 1 reply
a_t48yesterday at 8:32 AM

I've started doing this with hashes in a CLI I'm working on. For slow prints, it's somewhat helpful https://asciinema.org/a/aD38Pk88CZgSZqtq but for debug dumps with many many hashes it really helps readability and tracking hashes across lines.

barbstoday at 6:29 AM

Reminds me of this blog post where someone was trying to reverse-engineer a hand-rolled encryption protocol in a crossword app by colorizing the output and looking for patterns.

https://www.muppetlabs.com/~breadbox/txt/acre.html

7bityesterday at 10:17 AM

> it’s much easier to pick out the unique byte when it’s a different color! human brains are really good at spotting visual patterns—given the right format

Don't really see the advantage. Unique bytes have no unique meaning across data types.

The only good syntax highlight to me is 00 and perhaps FF. But that's my opinion of course.

Anything else that has no direct relation to what you're looking at is meaningless.

show 2 replies

🔗 View 6 more comments