Zimmie

240 followers27 following4220 posts since Jan 23, 2023

Zimmiebob_zim@infosec.exchange
4/14/2026
Test equipment is also so much better and more accessible. My multimeter has been annoying me (it takes *six* AA cells!), so I got a combination multimeter, signal generator, and oscilloscope for under $75 shipped. The oscilloscope part has limitations, but it’s extremely capable for the price. 50 MHz bandwidth on each of two channels, Y-T and X-Y modes, the ability to do math on the two signals and show the result as a third trace, live FFT, the ability to save a waveform for later viewing on a computer, all kinds of stuff.
Logic analyzers have also gotten really good and really cheap.
💬 0🔄 1⭐ 1
Zimmiebob_zim@infosec.exchange
4/4/2026
I have a small new project which needs a little compute and a Bluetooth transceiver, so I picked up a pack of ESP32-C6 “supermini” boards from a Chinese seller. I’ve seen boards like this pop up then disappear when the seller moves on to something else, but they also sent a solid documentation package. Schematics, board footprint, datasheets for all the chips down to the voltage regulator. They were something like $3.20 each.
The microcontroller world is so much better than it was when I was learning.
💬 1🔄 0⭐ 1
Zimmiebob_zim@infosec.exchange
3/19/2026
A lot of people criticize iOS and Android for making it harder to learn the low levels of how computers work. I’ve got to say, though, modern microcontrollers are so cheap and powerful it’s unreal. My first microcontroller was a 4 MHz PIC with 16 *bytes* of RAM and 256 *bytes* of storage for the program.
Today, for $19, you can get a pack of three ESP32 S3 units. Two primary cores at 240 MHz, a third core at ~20 MHz, 512 kB of RAM, 384 kB of main storage, 8 MB of SPI flash, all kinds of built-in peripherals (UARTs, SPI, I2C, even WiFi and Bluetooth). Learning how computers actually work has never been easier for people who want to know!
💬 1🔄 1⭐ 4
Zimmiebob_zim@infosec.exchange
11/8/2025
retooted Kolombiken
My dad who is 81 recently got into trouble cause the app where he pays his bills decided that his phone is too old.
He has a fully functional phone that’s a couple of years old but he now needs to throw it away because of software.
I think this is bullshit.
💬 55🔄 377⭐ 783
Zimmiebob_zim@infosec.exchange
10/30/2025
I just had a brief conversation involving how reads work under #ZFS, how flash-based SSDs actually perform, and how this means cache devices are only rarely useful. Thought I would elaborate a bit and share more broadly.
First, a quick description of ARC, the Adaptive Replacement Cache. It’s the ZFS read caching layer. It involves an index and two data pools. One pool stores most recently used data. The other pool stores most frequently used data, even if it wasn’t used recently. Any time data is read, the system checks the ARC index to see if the data is cached. If it is, the copy in RAM is returned. Otherwise, the storage is read. For most workloads, this combination works *really* well. The data you request is almost always in RAM, and only a small percentage of requests need to hit the much slower storage.
ZFS cache devices are often called L2ARC. They don’t really use ARC logic, though. Instead, cache devices are populated by data which is about to fall out of ARC. This is a constant trickle of data (usually about 10 MB/s) being written to the cache device. An index of what data is on the cache device is also stored in RAM. The amount of RAM spent on the index depends on the record size (100 GB of small records costs more to index than 100 GB of big records). Now, when data is requested, the system checks the ARC index, then the cache index, then the storage. If the cache index says the cache device has the data, it’s read from there rather than from the storage.
Everybody who has dealt with spinning disks knows they aren’t very good at reading and writing at the same time. The heads have to seek all over the disk. Unfortunately, SSDs have a very similar problem. If you give them an all-read or all-write workload, they’re really fast, but even a 10% mix and performance craters. Tom’s Hardware did an excellent test of an Optane drive a few years ago ( https://www.tomshardware.com/reviews/intel-optane-ssd-900p-3d-xpoint,5292-2.html )which demonstrated this problem. I’ve attached one of the more interesting performance graphs.
Note that at 90% read workload, performance of the non-Optane drives drops to less than half of what it is at 100% read. ZFS cache vdevs have a constant low-level write workload. They’ll never be above 90% read workload.
Meanwhile, the devices used to provide capacity to a ZFS pool only receive write workload when a transaction group closes. By default, this is every five seconds, but it can be less if the transaction group fills early. The rest of the time, there’s no writing. This leaves them free for 100% read workload for seconds at a time, even under heavy write load.
The result of this is that to provide benefit, a cache device has to be significantly faster than the pool devices. With SATA SSDs providing capacity, even an NVMe SSD is probably too slow to be a good cache device.
💬 0🔄 7⭐ 16
Zimmiebob_zim@infosec.exchange
10/15/2025
I wish to join the bunnies.
💬 0🔄 0⭐ 0
Zimmiebob_zim@infosec.exchange
10/15/2025
I shared this most recent story to remind myself of it because this week has been anything but fun troubleshooting. Just slog after slog. I get brought into some issue, and can tell in under an hour it’s not a problem in any of the gear anyone on my team owns or can access. Four hours later, we finally manage to lead the other team to the problem in their environment. One in another section of my company, one in a customer’s environment, and two so far in *vendors’* environments. Why am I the one finding problems in our vendors’ environments?
Then this F5 alarm fire breaks out.
💬 1🔄 0⭐ 0
Zimmiebob_zim@infosec.exchange
10/14/2025
And my NTP story for anybody who has somehow missed it:
https://infosec.exchange/@bob_zim/111862834586135218
💬 1🔄 0⭐ 1
Zimmiebob_zim@infosec.exchange
10/14/2025
Had another minor troubleshooting adventure recently! Not quite as exciting or far-ranging as my NTP story, but it was still fun.
I get pulled into a call. Something happened over the weekend, and I’m getting too much information from too many people. They’re asking me about load balancers, WAN circuits, a bunch of firewalls, and more. Not sure yet what actually happened. I have them slow down a bit and give me one thing which is affected. I confirm the path, and it goes through one firewall, over a WAN link to another datacenter, through another firewall, then to the server. No load balancers! It also skipped several other firewalls people had been asking about.
I ask for more problematic traffic flows. Eventually, I find an even simpler one: in one interface on the second firewall and out another interface on that same firewall. No load balancers. No routers. Just switches and one firewall.
I start packet captures on all the relevant interfaces. While waiting for a report of issues between those endpoints, we realize something interesting. You see, a bunch of separate teams have all been troubleshooting issues on their own. They’ve been keeping track of the issue timings separately. When we get enough together, we see that while the affected flows are rarely the same, *something* is happening every 20 minutes like clockwork.
The owner of the systems I’m watching reports they’ve had a traffic issue, so I kill the captures and pull them down. After a few seconds in the client-side capture, I’ve found the problem. Sure enough, some traffic just goes missing. Client sends some data, server doesn’t respond for several seconds. I look at the capture on the server-side interface. It’s different. The server sends the packet which was missing on the client-side capture. And the server-side capture doesn’t have some packets the client sent. So the firewall is definitely dropping or severely delaying some traffic. What gives?
Shortly before the issue is expected to occur, I start a debug which gives me a brief message about every packet the firewall drops along with a timestamp. I wait a few minutes, issues are reported, issues go away, and I kill it.
Copy it down, open it up, and I start by just scrolling through. The timestamps zip by, then I start seeing a lot of the same second. Then a *LOT* of the next second. I let it run for about five minutes, and 95% of the file has timestamps from a 15 second span. Huh.
This particular firewall brand works like a load balancer and a bunch of tiny firewalls in a box. It has worker processes which actually handle applying the rules, inspecting the data, and so on, and it has dispatcher processes which pick which traffic goes to which worker. A lot of the drops are problems enqueueing packets for a particular worker. And it’s all the same worker.
I look for that worker in the debug, and I find about 200k drops, all in the span of 15 seconds. None of the drops are for flows reporting problems, though. I do a little RegEx witchery to get just the source IP, destination IP, and destination port, then sort the result and pipe it to ‘uniq -c’ to get a feel for heavy hitters. Lots of tuples which appear one or two times, then two which appear 98,000 times each. It’s UDP 514 traffic from one client to a few destinations which were decommissioned some time ago. So this client is basically DoSing the firewall worker process, which prevents it from handling the other flows which are directed to it. The firewall has around 65 of these workers, so that explains why the issue is so sporadic! It only affects whatever ~1.5% of traffic is on the affected worker, and that changes all the time.
UDP port 514 is used for syslog. While waiting on someone from the team which owns the client sending all this traffic, I run another capture. I limit it to 500 packets, and I look for any traffic from that client to the two destinations it is trying to reach. It runs for a few minutes, then stops. I pull down the file and open it up.
Sure enough, it’s syslog traffic. It’s all from a sendmail process complaining it has no more space available in the filesystem which contains the mail spool. It can’t move messages around or accept any more mail.
The team which owns it joins and confirms they don’t need it, because they never actually finished setting it up. They approve shutting it off. 20 minutes later, no reports of problems! An hour later, and it’s officially declared resolved.
Lessons learned (really, reinforced):
• Document, document, document! Troubleshooting is a science, and that means you should follow the scientific method. Document your observations, tests you made, and what the tests told you. Document changes to your variables.
• Isolate a simple part of the issue and focus on that. If you find an even simpler version, change your focus to that simpler version. Once you learn what’s going on there, you can usually generalize it to the broader problem.
• Set up monitoring for your stuff early. Ideally before you set up any services or cronjobs on it.
• Decommission stuff you don’t actually use.
• It can be really hard to tell when an issue is just your problem versus when it’s something broader. If you’re dealing with weird, sporadic issues, record the timing and see if anybody else in similar teams has weird, sporadic issues. See if they mesh in any way.
💬 1🔄 10⭐ 16
Zimmiebob_zim@infosec.exchange
1/27/2025
@nina_kali_nina I remember we had a conversation a bit ago about watches, Pebble, and Watchy. Looks like a group at Google has managed to open source a bunch of the code involved in the Pebble watch, and some other people (including Eric Migicovsky, the founder) are planning to make new hardware!
https://repebble.com/
💬 1🔄 6⭐ 0
Zimmiebob_zim@infosec.exchange
1/6/2025
“I am aware that this alteration is comprehensive, and irreversible.”
💬 0🔄 0⭐ 0
Zimmiebob_zim@infosec.exchange
1/5/2025
Watching Severance again ahead of the second season’s debut. Everything about it is so, so good.
💬 0🔄 0⭐ 0
Zimmiebob_zim@infosec.exchange
11/4/2024
The FCC just published an exciting enforcement advisory!
“REMINDER: amateur and personal radio service licensees and operators may not use radio equipment to commit or facilitate criminal acts”
https://docs.fcc.gov/public/attachments/DA-24-1122A1.pdf
#amateurradio #hamradio
💬 0🔄 1⭐ 0