FITEBOX – How We Built an Autonomous Conference Recording Box
This post describes how FITEBOX was born, how it works, and how you can build one yourself. I wrote it as a complete guide so anyone running community events can understand both...
Filter by Category
Filter by Author
This post describes how FITEBOX was born, how it works, and how you can build one yourself. I wrote it as a complete guide so anyone running community events can understand both...
Posted by Juanmi Taboada
The Challenge I recently took on the challenge of modernising a Labelmate PM-300-CS dispensing machine. Ideally, I wanted to control this industrial hardware from my computer, so...
Posted by Juanmi Taboada
Conectar un motor paso a paso (Stepper) a un microcontrolador usando MicroPython abre la puerta a infinidad de proyectos de robótica, CNC, impresoras 3D o automatización. En este...
Posted by Juanmi Taboada
Debugging and profiling are essential components of software development. They allow you to identify and correct errors and optimise code performance. In this guide, we will...
Posted by Juanmi Taboada
Proper management of secrets, such as passwords, API keys, and certificates, is crucial for system security. In distributed environments and cloud deployments, mishandling...
Posted by Juanmi Taboada
In the last few days, I have been looking for a 3D Printer to bring some of my inventions to life. After reading and comparing, I went for an Anycubic Kobra 2 Neo. It is a great...
Posted by Juanmi Taboada
Build your own None Cat Auth (none cat is authorized, a respectful repelling cats system). This is a hardware implementation to repel cats in a non-harmful way. It shouldn’t...
Posted by Juanmi Taboada
Precision vs. Creativity: Navigating the Landscape of AI Language Models in Problem-Solving In the ever-evolving landscape of AI language models, precision and creativity stand as...
Posted by Juanmi Taboada
This post contains the conclusions of the Alioli ROV Submarine Drone and shows images and videos of it in the water. I wrote it as a diary so anybody can understand that Alioli...
Posted by Juanmi Taboada
Examples of how to work with JSON, YAML, CSV, and XML files in Python. Today I saw myself preparing some exercises for my student who is learning Python programming language, and...
Posted by Juanmi Taboada
This post describes how FITEBOX was born, how it works, and how you can build one yourself. I wrote it as a complete guide so anyone running community events can understand both the journey and the result. FITEBOX was not a straight project: it started with failures, dead hardware, lost recordings, and a lot of stubbornness. What you see here is the outcome of that process.
In my experience working with OpenSouthCode (an annual Open Source conference in Málaga, Spain), I learned that recording conference talks should not cost 40% of your event budget. FITEBOX is my answer to that problem: a Raspberry Pi 5 box that records talks autonomously, composite video with slides and speaker camera, mixed audio, branded overlays, and optional live streaming, all managed from a tiny OLED screen or a web UI on your phone.
One box per room. A volunteer plugs it in. No dedicated operator needed.
The full source code is available at github.com/juanmitaboada/fitebox under the Apache 2.0 license.

Disclaimer! This is an extensive reading. Grab a coffee. ☕

OpenSouthCode is a community conference. We are a non-profit association with no steady income. We depend entirely on sponsors who support us each year. For years, we hired professional recording crews to capture the talks across multiple rooms. The cost was high. At its peak, video recording consumed nearly 40% of the total event budget. That is money that should go to catering, venue, and making the conference better for attendees. It was unsustainable.
In 2025, David Santos and I decided to fix this. David had the vision for the hardware: a small box based on a Raspberry Pi 5 with an OLED display and physical buttons, capable of autonomously recording a conference room. He designed the hardware architecture, selected the components (the SSD1306 OLED, the GPIO buttons, and the capture card, which was the same one used at FOSDEM conferences), and built the first physical prototype, including the 3D-printed enclosure. The hardware foundation of FITEBOX is the result of David’s work.
We both started with FFmpeg because it seemed like the natural choice. When FFmpeg didn’t perform as expected, we both moved to OBS. David saw that OBS gave better results.
Here is where things went sideways. 🙃
What neither of us knew at the time was that the Raspberry Pi 5 has no hardware H.264 encoder. The Pi 4 had v4l2m2m (a hardware encoding path). But the Pi 5 dropped it entirely; it kept only the decoder.
On top of that, we used cheap Chinese microphones that had a reputation we didn’t know about: they tend to fail after 30 minutes of continuous use. During the event, mics started dying mid-talk.
I handed my FITEBOX unit to a volunteer who was managing the recording. But there was no user interface of any kind, no web panel, no OLED menus, nothing. It was all experimental. The volunteer struggled to control the hardware, and when things went wrong, there was no feedback from the system to explain what was happening.
In my FITEBOX, I developed a system that copied OBS screen templates at startup so we could choose between schemas, but I didn’t prevent misconfigurations due to hardware changes. OBS in my FITEBOX silently misconfigured itself on the first day. It wouldn’t warn about profile mismatches or recording failures. Talks were “recorded”, but the files showed either black screens (because of a laptop resolution mismatch and the webcam cover on) or a full template misconfiguration (videos positioned where they are not supposed to be).
This is how we expected them all to look:


We found some of the recordings dismantled like this:


We also spent time investigating RISC-V boards donated to the association, with the aim of using them for the recording infrastructure. That turned out to be a dead end for this particular use case because they arrived too late to the party 🎉 and we didn’t have enough time to prepare a working version.




The result of OpenSouthCode 2025: we lost nearly half of the recorded talks.
I was frustrated but stubborn. I knew the Raspberry Pi 5 had enough CPU power to do this with FFmpeg; it just needed the right software approach and some love. A few weeks ago, I started digging into the actual performance of FFmpeg on the Pi 5 with the help of Claude AI and Gemini as a pair-programming partner. That is when I discovered the missing hardware encoder, and also that there were several other things to tune: PCIe bus generation (Gen3 for the NVMe), USB buffer management, ALSA audio timing, and many more things that are invisible until you actually measure them.
I recovered my motivation and went deep. Weeks of development followed 36 versions of the recording engine, many manual tests on real hardware, and a complete web interface with real-time monitoring. FITEBOX grew from a rough experiment into a professional-grade recording and streaming system.
This post is the full story of how it works. 😎
Before we get into the build process, let me show you what a fully configured FITEBOX produces. I think it is important to see the destination before starting the journey.
The recording output is a 1920×1080 composite video with:
The file is written as an MKV (Matroska) file, which is crash-resistant if power fails; everything recorded up to that point is recoverable.


Here is the Bill of Materials with the actual prices I paid. Every link goes to the exact product.
| Component | Price | Link |
|---|---|---|
| Raspberry Pi 5 16GB | 166,99 € | AliExpress |
| Official RPi5 Power Supply (USB-C 5V/5A) | 20,78 € | Amazon |
| MicroSDXC 128GB (A2, U3 – only for initial boot config) | 17,75 € | Amazon |
| PCIe to NVMe HAT | 15,99 € | Amazon |
| NVMe SSD 500GB (Crucial P3) | 38,99 € | Amazon |
| ARCTIC M2 Pro NVMe Heatsink | 5,79 € | Amazon |
| Active Cooler Fan for RPi5 | 4,59 € | AliExpress |
| USB 3.0 HDMI Capture with Loop Out | 21,15 € | AliExpress |
| OLED SSD1315 128×64 with 4 integrated keys (I2C) | 2,22 € | AliExpress |
| 30×30mm 2-pin fans ×2 (for 3D printed case) | 6,49 € | Amazon |
| RTC Rechargeable Battery | 12,19 € | AliExpress |
Core subtotal: ~313€
| Component | Price | Link |
|---|---|---|
| USB 3.0 right-angle extension | 11,39 € | Amazon |
| Short USB 3.0 cable (left-to-right) | 4,89 € | AliExpress |
| USB-C panel mount extension (with screw holes) | 2,99 € | AliExpress |
| Micro HDMI panel mount extension | 4,29 € | AliExpress |
| USB-C extension cable (panel mount, 2-pack) | 14,99 € | Amazon |
| ARCTIC MX-6 Thermal Paste (4g) | 8,45 € | Amazon |
Cables subtotal: ~47€
| Component | Price | Link |
|---|---|---|
| Webcam 2K@60FPS with AI tracking | 79,98 € | Amazon |
| Godox MoveLink M2 Wireless Lavalier Mic (2.4GHz) | 76,39 € | AliExpress |
| Behringer UCA222 USB Audio Interface (required for Godox mic) | 36,98 € | Amazon |
Optional subtotal: ~193€
| Configuration | Price |
|---|---|
| Core + cables (minimum working system) | ~360€ |
| Core + cables + webcam + mic + sound card (full production) | ~553€ |
Yes, the full production setup is more than the initial 200€ dream. Once you want clean panel-mount connections, a decent webcam, a wireless microphone that doesn’t die after 30 minutes (we learned that lesson at OpenSouthCode 2025), and a proper USB audio interface to connect it, the cost per box reaches 400-550€. Still a fraction of hiring a professional crew at 2.000-5.000€ + per event, and you keep the equipment forever, plus a valuable contribution to OpenSource.
Why NVMe via PCIe HAT instead of USB SSD? The Raspberry Pi 5 has a PCIe lane that supports Gen 3 speeds. An NVMe SSD connected via a PCIe HAT delivers much higher throughput than USB 3.0 and, critically, does not compete with the capture card and webcam for USB bus bandwidth. The setup.sh script enables PCIe Gen 3 with dtparam=pciex1_gen=3 in the firmware config. A 500GB NVMe gives you roughly 500 hours of recording at ~1 GB/hour.
Why the OLED module with integrated keys? This is an elegant find: a single PCB that combines the SSD1315 OLED display (128×64, I2C) with 4 tactile buttons, all soldered to the board. One module, one cutout in the case, one clean panel. The buttons connect to GPIO pins (BCM 26, 16, 20, 19) and the display connects via I2C (SDA/SCL). The whole thing costs 2,22€.
Why the HDMI capture card with loop out? Because it uses a chipset that outputs MJPEG (not raw YUV) over USB. MJPEG is much lower bandwidth, which matters when you have two video devices and a sound card sharing the USB bus. Loop-out (HDMI passthrough) means the presenter’s laptop continues to feed the projector, with no splitter required.
Why the official RPi5 power supply? The Raspberry Pi 5 needs a stable 5V at 5A. Cheap USB-C adapters cause voltage drops under load → CPU throttling (encoding speed drops below real-time), USB device disconnects (your capture card vanishes mid-recording), and data corruption. I lost several test recordings before switching to the official PSU. Do not cheap out on the power supply.
Why a wireless mic + USB audio interface? Because cheap wireless USB microphones die. We learned this at OpenSouthCode 2025 when multiple mics failed after 30 minutes of continuous use. The Godox MoveLink M2 is a professional 2.4GHz wireless system that clips to the speaker’s collar. Its receiver outputs analogue audio, so you need a USB audio interface (the Behringer UCA222) to get the signal into the Pi. FITEBOX’s detect_audio.sh auto-classifies the Behringer as a sound_card and gives it top priority over webcam audio.
Why panel-mount cables? Because all external connections (USB-C power, HDMI in/out, USB for webcam and mic) should be accessible on the outside of the case with screw-secured connectors, while the internal wiring stays clean and stable. This is what separates a production appliance from a prototype.




David Santos designed the FITEBOX enclosure, and we printed it on our 3D printers. The STL files are included in the repository. The design has some notable features that are visible in the photos:


We chose a clean, permanent build philosophy. No breadboards, no flying Dupont wires, no tape. This is equipment that a volunteer will handle at a real event, and it must work reliably. If something comes loose during the conference, you lose all recordings for that room for the rest of the day.
This is the core of the build. The Raspberry Pi 5 sits at the top, with the PCIe NVMe HAT mounted under via brass standoffs, connected to the Pi’s PCIe connector via an FPC ribbon cable. The NVMe SSD slots into the HAT, and the ARCTIC M2 Pro heatsink goes on top of the SSD.
Apply thermal paste (usually included with the heatsink) between the SSD and its heatsink. Install the active cooler fan on the Pi 5’s CPU. The result is a compact two-board stack.




Secure this stack to the base of the enclosure.
Mount the two 30×30mm fans in the case’s designated ventilation areas. These are powered by the Pi’s 5V/GND GPIO pins (or a fan header if available). Orient them to blow outward. The active cooler on the Pi draws in air, and the exhaust fans push it out.
FITEBOX uses an integrated module that combines the SSD1315 OLED display (128×64) and 4 tactile buttons on a single PCB. This eliminates the need for separate button wiring; everything runs through a single connector strip.
The module connects to the Pi’s GPIO header with 8 wires (a single ribbon cable):
| Module Pin | Raspberry Pi Pin | GPIO |
|---|---|---|
| GND | Pin 39 | Ground |
| VCC | Pin 1 | 3.3V |
| SCL | Pin 5 | GPIO3 (I²C) |
| SDA | Pin 3 | GPIO2 (I²C) |
| K4 (Select) | Pin 35 | GPIO19 |
| K3 (Down) | Pin 38 | GPIO20 |
| K2 (Up) | Pin 36 | GPIO16 |
| K1 (Back) | Pin 37 | GPIO26 |
The module fits into a single rectangular cutout in the top panel of the case, secured with 4 screws. The buttons face outward and are easily accessible. No external resistors needed – the software uses the Pi’s internal pull-up resistors via gpiod.




The HDMI capture card sits in its own sub-compartment inside the case, in a separate 3D-printed enclosure mounted next to the Raspberry Pi. This keeps it thermally isolated (the capture card has its own heatsink, I got from a Pi kit) and mechanically secure.


Connect the capture card’s USB output to the Pi via an internal short USB 3.0 cable. The HDMI IN and HDMI OUT ports face the edge of the case, where panel-mount HDMI connectors provide clean external access.
The panel-mount cables provide clean external access. From outside the case:
The NVMe SSD is internal (connected via the PCIe HAT), so it does not use any USB ports. This frees the USB bus for the capture card, webcam, and audio devices.
Route cables neatly, verify the 30mm exhaust fans are connected and oriented correctly (blowing out), and close the enclosure. Verify that the OLED display is visible and all buttons are accessible through the top panel cutout.
FITEBOX boots directly from the NVMe SSD; do not use an SD card in production. Connect the NVMe (via USB adapter or the PCIe HAT on another Pi) and flash Raspberry Pi OS Lite (64-bit, Debian 12 Bookworm) to it using Raspberry Pi Imager. Enable SSH and set a hostname like fitebox.


Then boot the Pi with a temporary SD card to set PCIe:
$ sudo raspi-config # → Advanced Options → PCIe Speed → Yes # (Enables the PCIe connector and forces Gen 3.0 speeds)
Set the boot order:
$ sudo raspi-config # → Advanced Options → Boot Order → B1: SD Card Boot # (Boot from SD card if available, otherwise boot from NVMe)
Once configured, remove the SD card and the Pi boots from NVMe. Fast, reliable, and the full 500GB is available for both the OS and recordings on the same drive.
The OLED display requires I2C to be enabled:
$ sudo raspi-config # → Interface Options → I2C → Enable
Verify the OLED is detected (should show address 0x3c):
$ sudo apt install -y i2c-tools $ i2cdetect -y 1
Since the OS and recordings live on the same NVMe drive:
$ sudo apt-get install git $ git clone git@github.com:juanmitaboada/fitebox.git
The repository includes a setup.sh script that configures the entire system. It must be run with sudo (it detects your real user via $SUDO_USER):
$ cd fitebox $ sudo ./bin/setup.sh
Here is what the script does, step by step:
curl, wget, git, build-essential, v4l-utils, alsa-utils, bc, jqdocker groupusbfs_memory_mb=1000 (The default 16MB is far too small for two video capture devices on the same bus)vm.swappiness=10, dirty ratios down to 5-10%, real-time scheduling enabled, file limits raised to 65536cgroup_enable=memory In the boot command line for Docker memory managementdtparam=pciex1_gen=3 in config.txt) for faster SSD throughput. Sets boot to console autologinreboot, shutdown, and Plymouth displays messagesrecordings/, log/, run/ with correct ownership and permissions.env: writes your UID/GID so Docker containers create files owned by your user, not rootA reboot is required after setup to apply kernel parameters and PCIe changes. The script tells you this at the end with appropriate urgency. You can relaunch the script several times; it is safe to reuse.
Before building the Docker container, run the diagnostic tools to verify that all hardware is detected and working. FITEBOX includes a full suite of detection scripts:
# Full system diagnostic - generates a timestamped report ./diagnostics.sh # Audio device detection and classification ./detect_audio.sh # OLED display detection and visual test ./detect_oled.sh # GPIO button wiring test python3 detect_buttons.py
diagnostics.sh is the big one. It auto-detects whether it is running on the host or inside a Docker container and adjusts accordingly. It checks: OS and kernel info, Raspberry Pi thermal status and throttling, disk space and recording directory contents, USB device tree, video devices via v4l2-ctl, ALSA audio cards and mixer levels, PulseAudio/PipeWire status (should both be disabled), I2C bus for the OLED, all running FITEBOX processes, Docker container status, health files, recent logs, kernel error messages, network configuration, and CPU/memory usage. The full report is saved to /tmp/fitebox_diagnostic_YYYYMMDD_HHMMSS.txt so you can share it when asking for help.
detect_audio.sh deserves special mention because it has a dual mode: run it directly for a full diagnostic, or source it from another script (like the recording engine) and it exports variables (VOICE_DEV, HDMI_DEV, VOICE_CARD_ID, etc.) silently. It classifies every ALSA card by type (hdmi_capture, webcam, sound_card, usb_mic, generic_usb) and selects the best microphone with a priority order: professional sound card → USB mic → generic USB → webcam (last resort). If the HDMI capture and voice mic end up being the same device (which would cause “Device or resource busy”), it automatically disables HDMI audio and falls back to voice-only mode.
detect_oled.sh scans the I2C bus, finds the OLED at 0x3C or 0x3D, checks Python dependencies (luma.oled, Pillow), and then runs a series of visual tests on the display: clear screen, full white, border, text, animation, and a final info screen that stays on for 60 seconds so you can visually confirm the display is working. It even stops oled_controller.py if it’s running (since the OLED can only be controlled by one process) and restarts it afterwards.
detect_buttons.py uses the FiteboxHardware abstraction to test all 4 GPIO buttons in real-time – press a button and see it reported on screen with PRESS/RELEASE events at 100Hz polling.
$ docker compose build $ docker compose up -d
The container runs in privileged mode because it needs direct access to hardware devices: video (/dev/video*), audio (/dev/snd/*), I2C (/dev/i2c-*), GPIO (/dev/gpiochip*), and thermal sensors. This is not optional; without privileged mode, the container cannot access the capture cards, microphones, OLED display, or buttons.
When the container starts, three processes come up in sequence:
To understand why the recording engine is at version 36, it helps to see what different FFmpeg parameter choices look like in practice. During development, I tested many combinations on real Pi 5 hardware. Here are some examples:
Preset medium vs ultrafast: The medium preset produces better compression (smaller files, slightly better quality) but runs at ~0.7x the speed on the Pi 5, so it cannot keep up in real time. Frames are dropped, audio desyncs, and the recording is unusable. The ultrafast preset runs at ~1.0x speed (barely real-time), but the quality at CRF 28 is perfectly acceptable for conference recordings.
CRF 23 vs CRF 28: CRF 23 produces visually better video, but at roughly double the bitrate (~4.5 Mbps vs ~2.4 Mbps). On a Pi 5, the higher bitrate means more disk I/O and more CPU pressure on the encoder. CRF 28 at 1080p for slides, with a small speaker window, is more than good enough.
With and without aresample=async=1000: This one is only relevant for streaming, but it is dramatic. Without it, the streamed audio has periodic clicks every few seconds – inaudible artefacts from MKV AAC timestamp micro-irregularities that become audible when copied to FLV/RTMP. With async=1000, the audio is perfectly clean. More on this in the Streaming Pipeline section.
The OLED controller (oled_controller.py) is the first process that starts and the central hub of FITEBOX’s internal communication. It manages the SSD1315 display (compatible with SSD1306), reads GPIO button presses, and runs a Unix domain socket server that the other processes connect to.
The OLED is a 128×64 pixel monochrome display driven by the luma.oled Python library. All rendering is done with PIL/Pillow: text, icons, and QR codes are drawn as bitmaps and pushed to the display. The project includes 12 pixel/retro fonts (ChiKareGo, FreePixel, ProggyTiny, DejaVuSansMono, and others) for different display contexts.


When FITEBOX starts, the OLED shows a custom logo bitmap (128×64, 1-bit, base64-encoded in the source code). It is a small thing, but it tells you the system is alive before anything else is ready.


After boot, the OLED cycles through 8 status views using the Up/Down buttons:
| View | What It Shows |
|---|---|
| Overview | Recording state, elapsed time, CPU, temperature |
| System | RAM usage, swap, load average |
| Network | WiFi SSID, IP addresses, signal strength |
| Storage | SSD usage, remaining recording time estimate |
| Web Key | The 6-character authentication key for the web UI |
| QR Web | QR code with the web UI URL – scan with your phone |
| QR WiFi | QR code with WiFi credentials – scan to connect |
| About | Version, hostname, uptime |


The Web Key display is particularly important. It shows a 6-character hex code (like A1B2C3) that you type into the web UI to authenticate. No passwords to remember. The key is generated from a random key and displayed right on the box, ready for the operator to use; a persistent master key is also available to the sysadmin.


The QR codes are generated with the qrcode Python library:
The menu navigation uses the 4 physical buttons: Back, Up, Down, and Select. The entire OLED interface can be used without a phone or laptop. A volunteer can start recording, check status, and navigate networks using just the buttons on the box.


Pressing the Select button on any status view enters the interactive menu system:


The buttons use the gpiod library (v2 API) with software pull-up resistors.
# GPIO BCM pins for the 4 buttons (matching K1-K4 on the module) K1_BACK = 26 # Physical pin 37 K2_UP = 16 # Physical pin 36 K3_DOWN = 20 # Physical pin 38 K4_SELECT = 19 # Physical pin 35
Each button triggers an interrupt through gpiod with debounce handling. No polling loops wasting CPU.
This is an architectural detail that matters: the OLED controller is not just a display driver. It runs a Unix domain socket server at /fitebox/run/fitebox_control.sock. The FITEBOX manager and the web server connect to this socket as clients. The OLED controller acts as a message hub. When the manager sends a status update, the OLED controller broadcasts it to all connected clients (including the web server, which then pushes it to browsers via WebSocket).
The protocol is newline-delimited JSON over the Unix socket. Simple, debuggable, and reliable.
The repo includes oled_clients.py, an example client with an interactive menu that lets you connect to the socket and test things manually: update system status, start/stop recording, listen for button events, or run continuous monitoring with real psutil data. It is very useful during development and debugging.
This hub-and-spoke pattern means the OLED display, the web interface, and the manager all share the same real-time status stream. When you see a recording timer on the OLED, the web dashboard shows the same timer. They are fed by the same data.
The web interface is a FastAPI application (web/fitebox_web.py) running on port 8080 with Jinja2 templates and vanilla JavaScript (no React, no Vue, no build tools). It serves HTML, a REST API, and a WebSocket endpoint for real-time updates.
The CSS uses a dark theme (#1a1a2e background) with a mobile-first responsive design. Accent colours encode state: red for recording, blue for streaming, green for success.
FITEBOX uses HMAC-SHA256 signed requests for authentication. The flow:
HMAC(timestamp:body, key)No usernames, no passwords, no accounts. The key rotates between container restarts (also, there is a persistent master key in config/master.key).
The JavaScript client library (fitebox.js) handles HMAC signing, WebSocket reconnection (3-second auto-reconnect), and UI feedback (toasts, confirmation modals) automatically.


The Dashboard (/dashboard) is the main control panel. Everything you need during the event is on this single page.


Schedule integration: Paste the URL to your conference’s Frab/Pentabarf schedule XML (the standard format used by FOSDEM, CCC, OpenSouthCode, and many others), download it, and select the room assignment for this FITEBOX. The parser (lib/schedule_parser.py) extracts all talks by day and room.


Session info: If a conference schedule is loaded, the dashboard shows the current talk title and speaker name based on the room assignment and current time.
If you prefer to choose a different talk, click on the 3 lines icon ( ☰ ), and you will get the list of talks for that room for all days of the venue:


Once choosen you will be able to see all the details of the talk and start RECORDING:


Recording controls: the recording window includes a feature that lets you extract a ~15-second video clip from the growing MKV file and make it playable in the browser. This uses the “sandwich” technique (more on this in the Recording Engine section) and lets you confirm that the composite layout looks correct without stopping the recording. A large STOP button and several other indicators are visible in this window: a recording timer, the current file size, and session information from the conference schedule. That green line is the Health Histogram.


Health Histogram: one of my favourite features. It shows live FFmpeg stats parsed from the recording engine’s log: frames per second, bitrate, encoding speed, and dropped frames. If the encoding speed drops below 1.0x, you know immediately that something is wrong. One curious detail is that the bar never reaches the end because the video is still recording, and more data will be available seconds after the last analysis is completed. We will get back to this in the Videos section.


Streaming controls: in this button, you can control the streaming destination picker (YouTube / Twitch / Custom RTMP), and the stream key input. When you fill in the key, FITEBOX verifies it is valid and that it really connects to the selected service.


System stats: RAM usage, swap usage, disk I/O rates, network ping, and an estimated remaining recording time based on the current bitrate and available disk space.


collections.deque(maxlen=120)). When you connect via WebSocket, the server sends the full metrics history so the chart renders instantly – no waiting for data to accumulate.Header: All these stats are also available at all times in the header anywhere in the UI. When recording and streaming, FITEBOX goes through several steps:












In spanish we say that is better one image than a thousand words, here you have the 2 sides of a recording.
On one side is me recording with my cellphone:
On the other side is the final video streamed online without any edits (also a copy of the video, the original recording, is stored inside FITEBOX):
The Hardware page (/hardware) is for setup verification before recording starts.


If this is your first visit, you will be asked to click enable audio monitoring to grant the browser the required permissions to open the audio streams. Once clicked, the browser’s buffer will start receiving the audio from the different devices (the browser will play the audio once it considers it has enough audio in the buffer, which is about 7 seconds):


There is a 7-second lag between the video and the audio: It is normall and it doesn’t matter since this window is exclusively for setting up the scene and checking that the required devices are working. The green arrows on the left indicate which devices will be used during the recording (this selection is automatic and you can not change it)


-c:v copy).The Recordings page (/recordings) is the file manager.


Health Histogram: here you can see the histogram of every recording, so it warns you which videos had troubles while recording.


In-browser playback: MKV files cannot be played natively in most browsers, but FITEBOX remuxes them to MP4 on-the-fly using FFmpeg with -c copy (zero CPU, just container repackaging). The video plays directly in the browser.


Bumper concatenation: on every recording, you can click “Apply Bumpers” to combine your intro, recording, and outro into a single file. The system validates A/V sync and duration automatically and keeps a .pre-bumper backup of the original.
Metadata: Each recording has a JSON sidecar with author, title, date, duration, and technical details.
Deletion: If something went wrong, you can also DELETE files.


The System page (/system) is for pre-event configuration.


Bumper upload: Upload intro/outro video clips in any format. FITEBOX probes them with ffprobe, shows you the specs, and either copies them instantly (if the format matches: 1080p, 30fps, H.264, AAC 48kHz) or re-encodes them to match. A thumbnail preview is extracted.
Background upload: Upload your 1920×1080 branded PNG that serves as the composite recording frame. Keep in mind that the webcam and laptop images are fixed in place in this version.
Network management: Scan WiFi, connect to networks, create a hotspot (FITEBOX_AP), set static IP, and view Ethernet status.


The QR button shares the actual wifi in use:


Diagnostics: Hardware, audio, video, network, and storage checks. Run this before the event to debug and resolve issues.












The recording engine (recording_engine.sh, version 36) is the heart of FITEBOX. It reached version 36 because each version fixed a real bug discovered during manual testing on actual Pi 5 hardware. This is not a script I iterated in a simulator; each version was tested with real capture cards, real audio devices, and real conference scenarios.
The recording produces a 1920×1080 canvas with multiple layers:


The engine feeds 5 inputs to a single FFmpeg process:
| Input | Source | Flags |
|---|---|---|
| 0 | Background PNG | -loop 1 -framerate 30 |
| 1 | HDMI capture card | -f v4l2 -input_format mjpeg -video_size 1280x720 -framerate 30 |
| 2 | USB webcam | -f v4l2 -input_format mjpeg -video_size 640x480 -framerate 30 |
| 3 | Primary audio (mic) | -f alsa -ar 48000 -ac 2 |
| 4 | Secondary audio (HDMI, optional) | -f alsa -ar 48000 -ac 2 |
USB devices do not enumerate in a deterministic order. On one boot /dev/video0 might be the webcam, on the next boot, it might be the HDMI card. The recording engine solves this by detecting devices by name, not by number:
v4l2-ctl --list-devices # → Parse output looking for keywords: Hagibis, MS2109, camera, webcam
Audio detection works the same way. The detect_audio.sh script classifies ALSA cards into categories: hdmi_capture, webcam, sound_card, usb_mic, generic_usb. It then force-unmutes all capture channels, because some USB audio cards ship with capture channels muted by default:
amixer -c $CARD_NUM set Capture 100% unmute amixer -c $CARD_NUM set Mic 100% unmute amixer -c $CARD_NUM set 'Digital In' 100%
If two audio sources are detected (room mic + HDMI audio), FFmpeg combines them:
amix=inputs=2:duration=first:dropout_transition=2
If only one source is present, it passes through unchanged with acopy.
The composite is built using FFmpeg’s filter_complex:
setsar=1 is applied to fix SAR (Sample Aspect Ratio) issues from MJPEG capturedrawtextEncoder .... libx264 (software - NO hardware encoder on RPi5) Preset ..... ultrafast (the only preset that achieves ~1.0x speed) Profile .... Constrained Baseline (forced by ultrafast) Tune ....... zerolatency CRF ........ 28 (good quality for conference content) Resolution . 1920×1080 at 30fps Audio ...... AAC LC, 192 kbps, 48 kHz, stereo Container .. MKV (Matroska) Bitrate .... ~2.4 Mbps total (~1 GB/hour)
Why ultrafast? Because the Raspberry Pi 5 has no hardware H.264 encoder. All encoding is pure software libx264. The ultrafast preset is the only one that achieves real-time encoding speed (~1.0x) at 1080p on the Pi 5’s quad-core ARM CPU. Any other preset drops below 1.0x, and frames are lost.
Why MKV? Because Matroska writes data incrementally in clusters. If FFmpeg is killed (power failure, Docker stop, crash), all data up to the last completed cluster is recoverable. With MP4, a crash means the file is usually unplayable because the moov atom at the end was never written.
This is one of the more creative technical solutions in FITEBOX. While FFmpeg is actively writing to an MKV file, you cannot simply cp or ffprobe that file, the file is locked and in an inconsistent state. But we need to extract a preview clip for the web dashboard.
The solution: read the first 1MB of the file (which contains the MKV headers and track information) and the last 10MB (which contains the most recent video data), concatenate them into a temporary “sandwich” file, probe it for keyframes, and extract a ~5-second clip with an output seek.
[ first 1MB (headers) ] + [ last 10MB (recent data) ] = "sandwich.mkv" → ffprobe finds keyframes → extract ~15s clip → serve to browser
It sounds hacky. It works reliably.
The streaming pipeline (implemented in web/fitebox_web.py as Python asyncio code) was the hardest part of the entire project. I tested and rejected at least 7 different approaches before arriving at the one that works:
Rejected approaches:
tail -f on the MKV file → irregular byte bursts break Matroska cluster parsingtee muxer → “Slave muxer #0 failed” on FFmpeg 5.1concat demuxer with -c copy → H.264 profile mismatch between bumpers (Main) and recording (Constrained Baseline) → 7-second A/V desyncfilter_complex with split outputs → can only be mapped once without explicit split/asplitOne persistent FFmpeg output process maintains a single RTMP TCP connection throughout the broadcast. Content segments (intro bumper, live recording, outro bumper) are fed sequentially as MPEG-TS via the output process’s stdin.
feeder_ffmpeg → MPEG-TS bytes → Python asyncio pipe → output_ffmpeg stdin → FLV mux → RTMP
| Phase | What Happens |
|---|---|
| 1. Waiting | Polls health.json until recording starts (max 60s, 0.5s interval). Usually around 00:30. |
| 2. Buffering | Waits 15s for the MKV to accumulate enough data for reliable reading. Usually around 00:50. |
| 3. Intro | Feeds intro bumper → MPEG-TS → pipe. Video re-encoded (short clip, low CPU). |
| 4. Live | Feeds growing MKV → MPEG-TS → pipe. Video: -c:v copy (zero CPU). Audio: re-encode with aresample=async=1000. On EOF: seek back 10s, restart feeder. |
| 5. Draining | Recording stopped. Drains feeder to natural EOF (90s timeout). |
| 6. Outro | Feeds outro bumper → MPEG-TS → pipe. Video re-encoded. |
| 7. Closing | Closes stdin → drain → wait_closed → RTMP closes cleanly. |
The critical detail in the live phase is that the video uses -c:v copy – zero CPU. The Pi 5 cannot sustain two simultaneous 1080p encodes (recording + stream). By copying the already-encoded video from the MKV and only re-encoding the audio, the streaming overhead is negligible.
Each feeder process uses -output_ts_offset to chain timestamps correctly:
intro starts at ts = 0 live starts at ts = intro_duration outro starts at ts = intro_duration + live_duration
Without this, YouTube sees timestamp jumps between segments → buffering → stream dies.
This one took approximately 10 manual tests on real hardware to solve.
The problem: AAC packets in MKV containers have micro-timestamp irregularities on the order of milliseconds. During normal playback, these are inaudible. But when you copy those AAC packets directly to FLV/RTMP, the irregularities produce audible clicks every few seconds on the stream.
The wrong fix: -af aresample=async=1 – too gentle, does not correct enough drift.
The wrong fix: -c:a copy – preserves the broken timestamps entirely.
The correct fix:
-af aresample=async=1000
This aggressively corrects timestamp drift greater than ~21ms by inserting or dropping audio samples. The CPU cost of audio re-encoding is negligible compared to video.
FITEBOX runs as a single Docker container with four main processes:


Why not microservices? A Raspberry Pi is not a Kubernetes cluster. One container, all processes, Unix socket communication. Simpler to deploy, debug, and maintain. Starts faster. Uses less memory.
The Makefile provides convenient targets for running individual services during development: make oled_controller, make fitebox_manager, make fitebox_web, make recording_start, make recording_stop. It also handles Plymouth screen messages and background image conversion. During production, the Docker entrypoint handles service orchestration.
This matters because the Pi 5 CPU runs at capacity during recording. Every CPU cycle stolen by the monitoring system is a cycle FFmpeg cannot use.
The manager loop was initially taking 1.3 seconds per iteration with 20 subprocess forks (calling vcgencmd, nmcli, df, etc.). After optimisation:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Manager loop time | 1.3s | 0.2s | 85% reduction |
| Subprocess forks per iteration | 20 | 2 | 90% reduction |
| Frontend API calls per update | 3 endpoints | 1 unified | 66% reduction |
| Network info cache TTL | 5s | 60s | 12x fewer forks |
Key techniques:
psutil.cpu_percent(interval=None) instead of interval=1 (eliminates a 1-second blocking call)/sys/class/thermal/thermal_zone0/temp (instant file read) instead of vcgencmd subprocesscollections.deque(maxlen=120) for CPU/temp/network metrics (~2KB memory). Sent to the browser on WebSocket connect for instant chart restoration.The Docker entrypoint (entrypoint.sh) traps SIGTERM to ensure clean shutdown:
trap 'graceful_shutdown' SIGTERM
When Docker sends SIGTERM (on docker stop), the entrypoint gracefully stops all services in order. FFmpeg receives a clean exit signal, writes the final MKV headers, and the file is properly finalised. Without this trap, a docker stop would corrupt the current recording.
The WebSocket endpoint (/ws) authenticates on first message ({key: '...'}) and then pushes server messages:
| Message Type | Frequency | Content |
|---|---|---|
status_update | Every 3-5s | Full system state (CPU, temp, recording, network, disk) |
metrics_history | Once on connect | Ring buffer for CPU/temp/network charts |
response | On command | Command execution results |
error | On failure | Authentication failures, command errors |
These lessons come from many manual tests on real Raspberry Pi 5 hardware and the complete failure of our first production deployment at OpenSouthCode 2025. Each one costs hours of debugging.
I cannot stress this enough. The Pi 4 had v4l2m2m. The Pi 5 does not. This is a hardware decision by the Raspberry Pi Foundation, not a software bug. Do not waste time trying to enable it. All encoding must be software libx264. The entire FITEBOX architecture – the ultrafast preset, the -c:v copy streaming, the CPU optimisation work exists because of this single hardware limitation.
If PulseAudio is running, it exclusively grabs ALSA devices. FFmpeg cannot access them directly. The solution:
pulseaudio -k && sleep 1
In Docker, either do not install PulseAudio at all or kill it in the entrypoint before starting services.
This one caught me multiple times. Some USB audio cards arrive with their capture channels muted by default. The device appears in arecord -lFFmpeg opens it without error, but the recording is silent. Always force-unmute all channels before recording.
/dev/video0 is not guaranteed to be the same device across reboots. Always detect by device name, never by device number.
Covered in detail in the Streaming Pipeline section. The fix is aresample=async=1000. Always.
Without a SIGTERM trap in the entrypoint, docker stop sends SIGTERM directly to all processes. FFmpeg dies instantly. The MKV is corrupted. Always test: run docker stop while recording and verify the MKV plays correctly.
Cheap USB-C adapters cause voltage drops under load. The Pi 5 throttles the CPU (encoding speed drops below 1.0x), and USB devices randomly disconnect (your capture card or SSD vanishes mid-recording). I learned this the hard way with several corrupted test recordings before switching to the official RPi5 PSU.
This one is sneaky. Certain bulky micro-HDMI adapters or cables with thick, moulded connectors draw enough current to trigger the Pi 5’s under-voltage warning, even with the official power supply. The symptom is the lightning bolt icon and vcgencmd get_throttled showing under-voltage events.


The fix is simple: use slim, short micro HDMI cables without bulky adapters. Before blaming your power supply, try swapping the HDMI cable.
If you stop a stream and restart it too quickly, YouTube accepts the connection but degrades the quality or drops it after a few minutes. Wait 1-2 minutes between stream sessions.
| Item | DIY Cost | Professional Alternative |
|---|---|---|
| One FITEBOX unit (core + cables) | ~360€ | – |
| Full production (+ webcam + mic + sound card) | ~550€ | – |
| 5 units (full production, for 5 rooms) | ~2.750€ | 10.000-25.000+ per event |
| Recurring cost per event | 0€ (you own the hardware) | Full price again each year |
The key insight for associations: you invest once, and every future event is free. The hardware survives year after year. The software is open-source and community-maintained. You control everything.
Compare this with hiring a recording crew at 2.000-5.000€+ per event, every year, with no ownership of the equipment and no ability to customise the output.
FITEBOX is production-ready and will be deployed at OpenSouthCode 2026 in Málaga. But there are things I want to improve:
If you run a community conference, a hackerspace, a user group, or any association that records events, give FITEBOX a try. Clone the repo, build a box, and let me know how it goes. Pull requests are welcome.
David Santos – designed the hardware architecture for the first FITEBOX prototype, selected the OLED display and button components, built the preliminary OLED controller software, and designed and 3D-printed the first physical enclosure (STL files). The hardware foundation of this project exists because of David’s work.
Juanmi Taboada – software development: recording engine, streaming pipeline, web interface, system manager, OLED controller v2+, CPU optimisation, and all the FFmpeg debugging that made production stability possible.
OpenSouthCode – the annual open-source conference in Málaga, where FITEBOX was conceived, built, and deployed. The need to record every room affordably and reliably is the reason this project exists.
This project was developed with assistance from AI coding tools (Claude by Anthropic and Gemini by Google) as pair-programming partners during the development process, and I spent too many hours cleaning the code they provided, testing the wrong solutions, adding types when missing, refactoring algorithms and structures to provide sense and stability behind the craziness of AI.
Links:
FITEBOX – because every talk deserves to be recorded, and no community should go broke doing it.



🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰
The name FITEBOX is a tribute to the project’s roots in Málaga, Spain, blending local culture with technical purpose:
In short: It’s the “Check-this-out Box”, a smart, local, and powerful way to capture conference content without the headache.
The 80s hip-hop boombox:


🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰🐇🐰
🇬🇧 Read it in English, “GenCreate, GenUpdate, GenDetail and GenDelete“ En el último artículo describíamos Codenerix GenList para comenzar a usar los listados de CODENERIX. Ha...
In the last few days, I have been looking for a 3D Printer to bring some of my inventions to life. After reading and comparing, I went for an Anycubic Kobra 2 Neo. It is a great...