add comprehsensive latency analysis

This commit is contained in:
pstruebi
2025-10-08 10:25:25 +02:00
parent e503b36c50
commit c3ad7ebe3c
6 changed files with 722 additions and 79 deletions

158
LATENCY_ANALYSIS.md Normal file
View File

@@ -0,0 +1,158 @@
# Auracast Latency Analysis
**Date**: 2025-10-08
**Measured End-to-End Latency**: ~180ms
**Expected Latency**: ~130ms
**Gap**: ~74ms (unaccounted)
---
## Complete Latency Breakdown
### 1. **Transmitter Pipeline** (Measured: 32-34ms)
| Component | Latency | Status | Notes |
|-----------|---------|--------|-------|
| USB Device ADC→Callback | ~3ms | ✅ **VERIFIED** | Measured via sounddevice timing |
| ASRC Buffer (FIFO) | 1.6-3.7ms | ✅ Measured | Varies with clock drift correction |
| LC3 Encode | 0.2-0.5ms | ✅ Measured | Real-time frame encoding |
| ISO Application Queue | ~20ms | ✅ Measured | 1 frame × 20ms (iso_que_len=1) |
| HCI/Serial Queue | 0-10ms | ✅ Measured | Usually 0, UART keeps up |
| **Subtotal** | **32-34ms** | | |
### 2. **Bluetooth Transport** (32ms)
| Component | Latency | Status | Notes |
|-----------|---------|--------|-------|
| Transport Latency | 32ms | ✅ From BIG parameters | `transport_latency_big=32020` μs |
### 3. **Configured Presentation Delay** (40ms)
| Component | Latency | Status | Notes |
|-----------|---------|--------|-------|
| Presentation Delay | 40ms | ✅ From config | `presentation_delay_us=40000` |
### 4. **Total Accounted** = 104-106ms
### 5. **Missing Gap** = 74-76ms ❓
---
## Hypothesis Testing Results
### ✅ DISPROVEN: USB Device Internal Latency
**Test**: Measured sounddevice callback timing using `inputBufferAdcTime` vs `currentTime`
**Results**:
- Mean callback latency: **3.0ms**
- Range: 2-5ms
- USB device reports: 8.7ms default low latency
**Conclusion**: USB device is NOT contributing 74ms. It adds only ~3ms.
### ✅ VERIFIED: PipeWire Not Adding Significant Latency
**Settings**:
- clock.rate: 48000 Hz
- clock.quantum: 144 samples = **3ms** per cycle
- Minimal buffering
**Conclusion**: PipeWire adds ~3ms, not 74ms.
---
## Where Is the Missing 74ms?
Given that the **transmitter-side is fully accounted** (32-34ms + 32ms transport + 40ms presentation = 104-106ms), the missing **74ms must be on the RECEIVER side**.
### Likely Receiver Components:
#### 1. **Jitter Buffer** (30-50ms estimated)
- LE Audio receivers add buffering beyond `presentation_delay` to handle:
- Packet loss and retransmissions
- Clock drift between transmitter and receiver
- Radio interference and timing jitter
- **Industry typical**: 30-50ms for robust operation
#### 2. **DAC/Audio Output Pipeline** (20-30ms estimated)
- Hardware audio buffering
- DAC conversion latency
- Operating system audio stack
- Driver scheduling
#### 3. **Actual Presentation Delay > Configured** (10-20ms)
- `presentation_delay_us=40000` (40ms) may be a **minimum**
- Receiver might use larger value for robustness
- Some receivers add fixed safety margin
### **Estimated Receiver Total**: ~60-100ms (explains the 74ms gap!)
---
## Verification Steps
### To Prove Receiver-Side Latency:
1. **Check receiver's actual presentation delay**
- Query receiver's reported latency (if available via Bluetooth)
- Some LE Audio devices expose this via vendor commands
2. **Test with different presentation_delay_us values**
```python
# In auracast_config.py
presentation_delay_us = 20000 # Try 20ms instead of 40ms
```
- If end-to-end latency decreases proportionally → receiver respects it
- If latency stays ~180ms → receiver has fixed buffering
3. **Try different receiver device**
- If another LE Audio receiver shows ~130ms → confirms current receiver adds extra buffering
- If all receivers show ~180ms → it's inherent to LE Audio stack
4. **Check receiver "low latency mode"**
- Some LE Audio devices have gaming/low-latency modes
- May reduce jitter buffer at cost of robustness
---
## Current System Performance
### ✅ Excellent Transmitter Performance
- **32-34ms pipeline latency** is very good for software implementation
- ASRC stable at 1.6-3.7ms
- Encode fast at 0.2-0.5ms
- HCI queue not backing up (0-10ms)
### 📊 Bluetooth Transport
- **32ms transport latency** is standard for LE Audio with retransmissions (RTN=4)
- Could be reduced by lowering RTN, but reduces reliability
### ⚠️ Receiver Buffering (suspected)
- **~74ms unaccounted** likely in receiver
- Probably unavoidable with current receiver hardware/firmware
---
## Recommendations
1. **Accept ~180ms as baseline** if receiver-side buffering is confirmed
- Transmitter-side optimization complete
- 74ms receiver buffering is typical for robust LE Audio
2. **Test lower presentation_delay_us** (20ms instead of 40ms)
- May reduce total by ~20ms if receiver respects it
- Watch for audio glitches (sign of too-aggressive setting)
3. **Try different receiver** if ultra-low latency is critical
- Look for "gaming" or "low latency" LE Audio receivers
- Trade-off: May have more audio dropouts
4. **Consider LC3plus codec** (if supported)
- Lower frame duration (5ms vs 10ms)
- May reduce buffering requirements
- Not all devices support it
---
## Summary
**Your transmitter pipeline is excellent at ~32-34ms.** The additional 74ms gap is almost certainly receiver-side jitter buffering + DAC latency, which is typical for robust LE Audio operation. Total ~104-106ms measured on transmitter + ~74ms receiver = ~180ms end-to-end matches your measurements.

View File

@@ -313,6 +313,10 @@ async def init_broadcast(
logging.info('Setup ISO Data Path')
bigs[f'big{i}']['iso_queue'] = iso_queue
bigs[f'big{i}']['iso_que_len'] = conf.iso_que_len
# Store timing-critical BIG parameters for latency tracking
bigs[f'big{i}']['iso_interval'] = big.iso_interval # milliseconds
bigs[f'big{i}']['transport_latency_big'] = big.transport_latency_big # microseconds
logging.info(f'big{i} parameters are:')
logging.info('%s', pprint.pformat(vars(big)))
@@ -320,17 +324,53 @@ async def init_broadcast(
await asyncio.sleep(i+1) # Wait for advertising to set up
# Track HCI/serial queue flow for latency measurement
import time
flow_state = {
'last_completed': 0,
'last_log_time': 0.0,
'max_pending': 0,
'max_queued': 0,
}
def on_flow():
data_packet_queue = iso_queue.data_packet_queue
print(
f'\rPACKETS: pending={data_packet_queue.pending}, '
f'queued={data_packet_queue.queued}, '
f'completed={data_packet_queue.completed}',
end='',
)
if global_config.debug:
bigs[f'big{0}']['iso_queue'].data_packet_queue.on('flow', on_flow)
now = time.time()
# Track maximums
if data_packet_queue.pending > flow_state['max_pending']:
flow_state['max_pending'] = data_packet_queue.pending
if data_packet_queue.queued > flow_state['max_queued']:
flow_state['max_queued'] = data_packet_queue.queued
# Log periodically (every 1 second)
if now - flow_state['last_log_time'] >= 1.0:
completed_delta = data_packet_queue.completed - flow_state['last_completed']
flow_state['last_completed'] = data_packet_queue.completed
flow_state['last_log_time'] = now
# Estimate serial/HCI latency from pending packets
# Each packet takes ~iso_interval to transmit (10ms typically)
conf_ref = bigs.get(f'big{i}', {})
iso_interval_ms = conf_ref.get('iso_interval', 10.0)
hci_latency_ms = data_packet_queue.pending * iso_interval_ms
logging.info(
"LATENCY HCI/Serial: pending=%d queued=%d (HCI buffer ~%.1f ms) | completed/s=%d | max: pending=%d queued=%d",
data_packet_queue.pending,
data_packet_queue.queued,
hci_latency_ms,
completed_delta,
flow_state['max_pending'],
flow_state['max_queued']
)
# Reset max counters
flow_state['max_pending'] = 0
flow_state['max_queued'] = 0
# Always enable flow tracking for latency measurement
bigs[f'big{0}']['iso_queue'].data_packet_queue.on('flow', on_flow)
bigs[f'big{0}']['flow_state'] = flow_state
return bigs
@@ -619,14 +659,54 @@ class Streamer():
big['channels'] = pcm_format.channels
big['lc3_frame_samples'] = lc3_frame_samples
big['lc3_bytes_per_frame'] = global_config.octets_per_frame
big['sampling_frequency'] = global_config.auracast_sampling_rate_hz
big['audio_input'] = audio_input
big['encoder'] = encoder
big['precoded'] = False
# get and discard first frame
await anext(big['audio_input'].frames(big['lc3_frame_samples']), None)
# Log latency-relevant configuration and expected latency budget
for big_name, big in self.bigs.items():
frame_duration_ms = (big['lc3_frame_samples'] / big['sampling_frequency']) * 1000.0
iso_interval_ms = big.get('iso_interval', 0.0)
presentation_delay_ms = global_config.presentation_delay_us / 1000.0
transport_latency_ms = big.get('transport_latency_big', 0) / 1000.0
iso_queue_len = big.get('iso_que_len', 1)
# Expected latency breakdown:
# 1. ASRC buffer (target ~10-15ms, varies with clock drift)
# 2. Frame read/encode/write (measured ~0.2-0.5ms typically)
# 3. ISO app queue (IsoPacketStream._thresholds, max=iso_queue_len frames)
# 4. HCI/Serial queue (data_packet_queue.pending, varies with UART speed)
# 5. Transport latency (from BIG parameters - radio air time)
# 6. Presentation delay (configured delay at receiver)
asrc_target_ms = 10.0 # from io_asrc.py, but varies with clock drift
encode_estimate_ms = 0.5 # typical LC3 encode time
iso_buffer_ms = iso_queue_len * frame_duration_ms # max capacity
hci_estimate_ms = 0.0 # measured in real-time, typically 0 if UART keeps up
pipeline_estimate_ms = asrc_target_ms + encode_estimate_ms + iso_buffer_ms + hci_estimate_ms
expected_total_ms = pipeline_estimate_ms + transport_latency_ms + presentation_delay_ms
logging.info(
"LATENCY CONFIG [%s]: Frame=%.1f ms | ISO interval=%.1f ms | ISO queue capacity=%d frames",
big_name, frame_duration_ms, iso_interval_ms, iso_queue_len
)
logging.info(
"LATENCY BUDGET [%s]: Pipeline~%.1f (ASRC~%.1f + enc~%.1f + ISO~%.1f + HCI~%.1f) + Transport~%.1f + Presentation~%.1f = ~%.1f ms end-to-end",
big_name, pipeline_estimate_ms, asrc_target_ms, encode_estimate_ms, iso_buffer_ms, hci_estimate_ms, transport_latency_ms, presentation_delay_ms, expected_total_ms
)
logging.info("Streaming audio...")
bigs = self.bigs
self.is_streaming = True
# Latency tracking
import time
frame_count = 0
last_latency_log = 0.0
latency_log_interval = 1.0 # seconds
# One streamer fits all
while self.is_streaming:
stream_finished = [False for _ in range(len(bigs))]
@@ -645,13 +725,25 @@ class Streamer():
if frames_gen is None:
frames_gen = big['audio_input'].frames(big['lc3_frame_samples'])
big['frames_gen'] = frames_gen
t_read_start = time.time()
pcm_frame = await anext(frames_gen, None)
t_read_end = time.time()
# Get frame age from ASRC if available
asrc_frame_age_ms = 0.0
try:
if hasattr(big['audio_input'], '_last_frame_age_ms'):
asrc_frame_age_ms = big['audio_input']._last_frame_age_ms
except Exception:
pass
if pcm_frame is None: # Not all streams may stop at the same time
stream_finished[i] = True
continue
# Down-mix multi-channel PCM to mono for LC3 encoder if needed
t_mix_start = time.time()
if big.get('channels', 1) > 1:
if isinstance(pcm_frame, np.ndarray):
if pcm_frame.ndim > 1:
@@ -663,12 +755,66 @@ class Streamer():
samples = np.frombuffer(pcm_frame, dtype=dtype)
samples = samples.reshape(-1, big['channels']).mean(axis=1)
pcm_frame = samples.astype(dtype).tobytes()
t_mix_end = time.time()
t_encode_start = time.time()
lc3_frame = big['encoder'].encode(
pcm_frame, num_bytes=big['lc3_bytes_per_frame'], bit_depth=big['pcm_bit_depth']
)
t_encode_end = time.time()
t_write_start = time.time()
await big['iso_queue'].write(lc3_frame)
t_write_end = time.time()
# Periodic latency logging
frame_count += 1
now = time.time()
if now - last_latency_log >= latency_log_interval:
# Get ISO application queue depth (packets waiting in IsoPacketStream)
iso_depth = 0
try:
# The _thresholds deque tracks packets queued but not yet completed
iso_depth = len(big['iso_queue']._thresholds)
except Exception:
iso_depth = 0
# Calculate ISO queue latency (application-level buffering)
frame_duration_ms = (big['lc3_frame_samples'] / big['sampling_frequency']) * 1000.0
iso_latency_ms = iso_depth * frame_duration_ms
# Get HCI/serial queue depth (packets in HCI layer waiting for UART transmission)
hci_pending = 0
hci_latency_ms = 0.0
try:
hci_pending = big['iso_queue'].data_packet_queue.pending
iso_interval_ms = big.get('iso_interval', 10.0)
hci_latency_ms = hci_pending * iso_interval_ms
except Exception:
pass
# Measure operation times
read_ms = (t_read_end - t_read_start) * 1000.0
encode_ms = (t_encode_end - t_encode_start) * 1000.0 if 'precoded' not in big or not big['precoded'] else 0.0
write_ms = (t_write_end - t_write_start) * 1000.0
# Calculate total pipeline latency: ASRC age + encode + ISO buffer + HCI buffer
pipeline_latency_ms = asrc_frame_age_ms + encode_ms + iso_latency_ms + hci_latency_ms
# Get transport and presentation from config
transport_ms = big.get('transport_latency_big', 0) / 1000.0
presentation_ms = global_config.presentation_delay_us / 1000.0
total_expected_ms = pipeline_latency_ms + transport_ms + presentation_ms
logging.info(
"LATENCY REALTIME: capture→radio=%.1f ms (ASRC=%.1f + enc=%.1f + ISO_q=%.1f + HCI=%.1f) | depths: ISO=%d HCI=%d",
pipeline_latency_ms, asrc_frame_age_ms, encode_ms, iso_latency_ms, hci_latency_ms, iso_depth, hci_pending
)
logging.info(
"LATENCY TOTAL: capture→speaker = %.1f ms pipeline + %.1f ms transport + %.1f ms presentation = %.1f ms (measured ~180ms, gap=%.1f ms)",
pipeline_latency_ms, transport_ms, presentation_ms, total_expected_ms, 180.0 - total_expected_ms
)
last_latency_log = now
if all(stream_finished): # Take into account that multiple files have different lengths
logging.info('All streams finished, stopping streamer')

View File

@@ -60,7 +60,7 @@ if __name__ == "__main__":
# Load .env located next to this script (only uppercase keys will be referenced)
load_dotenv(dotenv_path='.env')
os.environ.setdefault("PULSE_LATENCY_MSEC", "3")
#os.environ.setdefault("PULSE_LATENCY_MSEC", "3")
# Refresh device cache and list inputs
refresh_pw_cache()
@@ -156,7 +156,7 @@ if __name__ == "__main__":
octets_per_frame = OCTETS_PER_FRAME,
transport=TRANSPORT1
)
config.debug = True
#config.debug = True
logging.info(config.model_dump_json(indent=2))
multicast.run_async(

View File

@@ -1 +1 @@
5078804E6FBCF893D5537715FD928E46AD576ECB
5078804E6FBCF893D5537715FD928E46AD576ECD

View File

@@ -16,6 +16,10 @@
from bumble.audio.io import PcmFormat, ThreadedAudioInput, logger # only top-level import
# Toggle for local debug logging in this module
DEBUG_LOG = False
LATENCY_DEBUG = True # Track latency through the pipeline
class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
"""Sound device audio input with a simple ASRC stage.
@@ -27,7 +31,9 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
- _close() -> None
Behavior:
- Captures mono float32 frames from the device.
- Captures mono PCM frames using the device's native integer format by
attempting (in order): S32LE (int32), S24LE (packed int24), S16LE (int16),
and converts to internal float32.
- Buffers into an internal ring buffer.
- Produces stereo INT16 frames using a linear-interp resampler whose
ratio is adjusted by a tiny PI loop to hold FIFO depth near a target.
@@ -73,7 +79,7 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
# Target FIFO level and deadband (slightly larger headroom for jitter)
fs = float(self._pcm_format_in.sample_rate)
self._target_samples = max(1, int(0.015 * fs)) # 20ms headroom to avoid underflows
self._target_samples = max(1, int(0.015 * fs))
# Slightly larger deadband to reduce chatter (≈1 ms)
self._deadband = max(1, int(0.001 * fs))
@@ -92,9 +98,28 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
self._in_samples = 0
self._in_last_log = 0.0
# --- Benchmarking state for _read() ---
self._bench_count = 0
self._bench_total_sum = 0.0
self._bench_proc_sum = 0.0
self._bench_conv_sum = 0.0
self._bench_total_min = float('inf')
self._bench_total_max = 0.0
self._bench_last_log = 0.0
self._bench_loop = 0 # increments every _read call
# Streaming resampler and internal output buffer (lazy init)
self._rs = None # samplerate.Resampler
self._out_buf = None # numpy.ndarray float32
self._in_dtype = None # chosen input dtype string: 'int32' | 'int24' | 'int16'
# Latency tracking
self._capture_timestamp = None # timestamp of last captured samples
self._latency_log_interval = 1.0 # log every N seconds
self._last_latency_log = 0.0
# Frame timestamp queue (circular buffer)
self._frame_timestamps = [] # list of (sample_position, timestamp)
self._total_samples_written = 0 # total samples written to FIFO
# ---------------- Internal helpers -----------------
def _init_rb(self) -> None:
@@ -196,7 +221,7 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
r_cmd = self._R0 * (1.0 + ppm_cmd * 1e-6)
# Slew-rate limit per update
ppm_prev = 1e6 * (self._r / self._R0 - 1.0)
ppm_prev = 1e6 * (self._r / self._R0 - 1.0) # --- Benchmarking state for _read() ---
dppm = ppm_cmd - ppm_prev
if dppm > self._max_step_ppm:
ppm_cmd = ppm_prev + self._max_step_ppm
@@ -218,7 +243,7 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
try:
import time as _time
now = _time.time()
if now - self._last_log > 1.0:
if DEBUG_LOG and (now - self._last_log > 1.0):
buf_ms = 1000.0 * self._fifo_len() / float(self._pcm_format_in.sample_rate)
print(
f"\nASRC buf={buf_ms:5.1f} ms r={self._r:.9f} corr={1e6 * (self._r / self._R0 - 1.0):+7.1f} ppm"
@@ -277,16 +302,17 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
out = np.zeros(n_out, dtype=np.float32)
# Debug: report zero-padding underflow
try:
import time as _time
if _time.time() - self._last_log > 0.5:
produced = int(self._out_buf.size)
fifo_now = int(self._fifo_len())
corr_ppm = 1e6 * (self._r / self._R0 - 1.0)
print(
f"\nASRC debug: zero-padding underflow produced={produced}/{n_out} "
f"fifo={fifo_now} r={self._r:.9f} corr={corr_ppm:+.1f} ppm"
)
self._last_log = _time.time()
if DEBUG_LOG:
import time as _time
if _time.time() - self._last_log > 0.5:
produced = int(self._out_buf.size)
fifo_now = int(self._fifo_len())
corr_ppm = 1e6 * (self._r / self._R0 - 1.0)
print(
f"\nASRC debug: zero-padding underflow produced={produced}/{n_out} "
f"fifo={fifo_now} r={self._r:.9f} corr={corr_ppm:+.1f} ppm"
)
self._last_log = _time.time()
except Exception:
pass
if self._out_buf.size:
@@ -310,7 +336,7 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
# ---------------- ThreadedAudioInput hooks -----------------
def _open(self) -> PcmFormat:
# Set up sounddevice RawInputStream (int16) and start callback producer
# Set up sounddevice RawInputStream (auto: int32 -> int24 -> int16) and start callback producer
import sounddevice # pylint: disable=import-error
import math
import samplerate as sr # type: ignore
@@ -394,61 +420,122 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
pa_ver = sounddevice.get_portaudio_version()[1]
except Exception:
pa_ver = ''
logger.info(
"ASRC sounddevice: device='%s' hostapi='%s' samplerate=%d channels=%d portaudio='%s'",
dev_info.get('name', '?'),
hostapi_info.get('name', '?'),
samplerate,
channels,
pa_ver or '?'
)
if DEBUG_LOG:
logger.info(
"ASRC sounddevice: device='%s' hostapi='%s' samplerate=%d channels=%d portaudio='%s'",
dev_info.get('name', '?'),
hostapi_info.get('name', '?'),
samplerate,
channels,
pa_ver or '?'
)
def _callback(indata, frames, time_info, status): # noqa: ARG001 (signature is fixed)
# indata: raw int16 bytes-like buffer of shape (frames, channels)
# indata: raw integer PCM bytes-like buffer of shape (frames, channels)
try:
if status:
if status and DEBUG_LOG:
logger.warning("Input status: %s", status)
if frames <= 0:
return
# Vectorized convert INT16 LE mono -> float32 [-1,1]
# Convert native integer PCM -> float32 [-1,1]
import numpy as _np # type: ignore
try:
x_i16 = _np.frombuffer(indata, dtype=_np.int16)
x_f32 = _np.asarray(x_i16, dtype=_np.float32) * (1.0 / 32768.0)
except Exception:
# Fallback to safe (slower) path if buffer view fails
mv = memoryview(indata).cast('h')
x_f32 = _np.empty(frames, dtype=_np.float32)
for i in range(frames):
v = mv[i]
f = float(v) / 32768.0
if not (f == f) or math.isinf(f):
f = 0.0
x_f32[i] = f
dtype = self._in_dtype or 'int32'
if dtype == 'int32':
try:
x_i32 = _np.frombuffer(indata, dtype=_np.int32)
x_f32 = _np.asarray(x_i32, dtype=_np.float32) * (1.0 / 2147483648.0)
except Exception:
mv = memoryview(indata).cast('i')
x_f32 = _np.empty(frames, dtype=_np.float32)
for i in range(frames):
v = mv[i]
f = float(v) / 2147483648.0
if not (f == f) or math.isinf(f):
f = 0.0
x_f32[i] = f
elif dtype == 'int16':
try:
x_i16 = _np.frombuffer(indata, dtype=_np.int16)
x_f32 = _np.asarray(x_i16, dtype=_np.float32) * (1.0 / 32768.0)
except Exception:
mv = memoryview(indata).cast('h')
x_f32 = _np.empty(frames, dtype=_np.float32)
for i in range(frames):
v = mv[i]
f = float(v) / 32768.0
if not (f == f) or math.isinf(f):
f = 0.0
x_f32[i] = f
else: # 'int24' packed S24LE
try:
b = _np.frombuffer(indata, dtype=_np.uint8)
# Expect length == frames * channels * 3, channels=1
if b.size != frames * 3:
# Fallback: truncate to multiple of 3
n3 = (b.size // 3) * 3
b = b[:n3]
a = b.reshape(-1, 3)
# Little-endian assembly
val = (a[:, 0].astype(_np.int32) |
(a[:, 1].astype(_np.int32) << 8) |
(a[:, 2].astype(_np.int32) << 16))
# Sign-extend from 24-bit
neg = (a[:, 2] & 0x80) != 0
val[neg] -= (1 << 24)
x_f32 = val.astype(_np.float32) * (1.0 / 8388608.0) # 2**23
except Exception:
# Very slow scalar fallback
mv = memoryview(indata)
x_f32 = _np.empty(frames, dtype=_np.float32)
for i in range(frames):
o = i * 3
b0 = mv[o]
b1 = mv[o + 1]
b2 = mv[o + 2]
v = (b0 | (b1 << 8) | (b2 << 16))
if b2 & 0x80:
v -= (1 << 24)
f = float(v) / 8388608.0
if not (f == f) or math.isinf(f):
f = 0.0
x_f32[i] = f
# Debug: detect silent input chunks (all zeros)
try:
import time as _time
if _np.max(_np.abs(x_f32)) == 0.0 and (_time.time() - self._last_log > 0.5):
print("\nASRC debug: input chunk is silent (all zeros)")
self._last_log = _time.time()
if DEBUG_LOG:
import time as _time
if _np.max(_np.abs(x_f32)) == 0.0 and (_time.time() - self._last_log > 0.5):
print("\nASRC debug: input chunk is silent (all zeros)")
self._last_log = _time.time()
except Exception:
pass
# Write to FIFO
self._fifo_write(x_f32)
# Track capture timestamp for latency measurement
if LATENCY_DEBUG:
import time as _time
capture_ts = _time.time()
self._capture_timestamp = capture_ts
# Record timestamp for these samples
self._total_samples_written += frames
self._frame_timestamps.append((self._total_samples_written, capture_ts))
# Keep only last 100 timestamps
if len(self._frame_timestamps) > 100:
self._frame_timestamps.pop(0)
# Input throughput logging
try:
import time as _time
self._in_samples += int(frames)
now = _time.time()
if now - self._in_last_log >= 1.0:
sps = self._in_samples / max(1e-9, (now - self._in_last_log))
fifo_now = int(self._fifo_len())
corr_ppm = 1e6 * (self._r / self._R0 - 1.0)
print(
f"\nASRC debug: input_sps={sps:.1f} fifo={fifo_now} r={self._r:.9f} corr={corr_ppm:+.1f} ppm"
)
self._in_samples = 0
self._in_last_log = now
if DEBUG_LOG:
import time as _time
self._in_samples += int(frames)
now = _time.time()
if now - self._in_last_log >= 1.0:
sps = self._in_samples / max(1e-9, (now - self._in_last_log))
fifo_now = int(self._fifo_len())
corr_ppm = 1e6 * (self._r / self._R0 - 1.0)
print(
f"\nASRC debug: input_sps={sps:.1f} fifo={fifo_now} r={self._r:.9f} corr={corr_ppm:+.1f} ppm"
)
self._in_samples = 0
self._in_last_log = now
except Exception:
pass
except Exception: # never let callback raise
@@ -461,17 +548,33 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
logger.exception("Failed to create samplerate.Resampler; audio may be silent")
self._rs = None
self._stream = sounddevice.RawInputStream(
samplerate=samplerate,
device=self._device,
channels=channels,
dtype='int16',
callback=_callback,
blocksize=int( 2.0e-3 * samplerate),
latency='low', # Use low latency to minimize buffer size
)
# Try opening stream with preferred native formats
last_err = None
for cand in ('int32', 'int24', 'int16'):
try:
st = sounddevice.RawInputStream(
samplerate=samplerate,
device=self._device,
channels=channels,
dtype=cand,
callback=_callback,
blocksize=int(2e-3 * samplerate),
latency='low',
)
self._stream = st
self._in_dtype = cand
if DEBUG_LOG:
logger.info("ASRC selected input dtype: %s", cand)
break
except Exception as e: # keep trying next
last_err = e
continue
if self._stream is None:
logger.exception("ASRC failed to open RawInputStream for all dtypes (int32,int24,int16)")
if last_err:
raise last_err
raise RuntimeError("ASRC: could not open input stream")
# Don't start stream yet - wait for first read to avoid buffer buildup
# self._stream.start() # Delayed until first _read()
return self._pcm_format_out
@@ -479,23 +582,138 @@ class SoundDeviceAudioInputAsrc(ThreadedAudioInput):
# Produce 'frame_size' output frames (stereo INT16)
if frame_size <= 0:
return b''
# loop counter
self._bench_loop += 1
# Benchmark start
try:
from time import perf_counter as _pc
except Exception:
_pc = None # type: ignore
_t0 = _pc() if _pc else None
# Track which samples we're reading for latency measurement
read_start_time = None
if LATENCY_DEBUG:
import time as _time
read_start_time = _time.time()
# Start stream on first read to avoid buffer buildup during initialization
if self._stream and not self._stream_started:
self._stream.start()
self._stream_started = True
# Don't block - let callback start filling buffer asynchronously
_t1 = _pc() if _pc else None
# Update resampling ratio based on FIFO level
try:
self._update_ratio()
except Exception:
# keep going even if update failed
pass
_t2 = _pc() if _pc else None
# Process mono float32
mono = self._process(frame_size)
_t3 = _pc() if _pc else None
# Convert to stereo int16 LE bytes
return self._mono_to_stereo_int16_bytes(mono)
out = self._mono_to_stereo_int16_bytes(mono)
_t4 = _pc() if _pc else None
# Latency tracking: estimate frame age from timestamp queue
frame_age_ms = 0.0
if LATENCY_DEBUG and read_start_time and self._frame_timestamps:
import time as _time
now = _time.time()
# Find oldest timestamp still in FIFO (conservative estimate)
fifo_samples = self._fifo_len()
if fifo_samples > 0 and self._frame_timestamps:
# Estimate: we're reading from tail of FIFO, oldest data is ~fifo_samples ago
# Find timestamp of samples that are now at the read position
read_position = self._total_samples_written - fifo_samples
# Find closest timestamp
frame_ts = None
for pos, ts in self._frame_timestamps:
if pos >= read_position:
frame_ts = ts
break
if frame_ts:
frame_age_ms = 1000.0 * (now - frame_ts)
fifo_ms = 1000.0 * fifo_samples / float(self._pcm_format_in.sample_rate)
if now - self._last_latency_log >= self._latency_log_interval:
logger.info(
"LATENCY: ASRC frame_age=%.1f ms | FIFO: %d samples (%.1f ms) | corr: %+.1f ppm",
frame_age_ms,
fifo_samples,
fifo_ms,
1e6 * (self._r / self._R0 - 1.0)
)
self._last_latency_log = now
# Aggregate benchmarking stats
try:
if _pc and _t0 is not None and _t4 is not None:
total = _t4 - _t0
proc = (_t3 - _t2) if (_t3 is not None and _t2 is not None) else 0.0
conv = (_t4 - _t3) if (_t4 is not None and _t3 is not None) else 0.0
# Compute 'other' segment (everything outside proc+conv)
other = max(0.0, total - (proc + conv))
# Immediate warning if proc+conv or total exceeds 10 ms
if DEBUG_LOG and ((proc + conv) > 0.010 or total > 0.010):
logger.warning(
"ASRC _read SLOW: loop=%d frame=%d total=%.3f ms | proc=%.3f ms conv=%.3f ms other=%.3f ms",
self._bench_loop,
frame_size,
1000.0 * total,
1000.0 * proc,
1000.0 * conv,
1000.0 * other,
)
self._bench_count += 1
self._bench_total_sum += total
self._bench_proc_sum += proc
self._bench_conv_sum += conv
if total < self._bench_total_min:
self._bench_total_min = total
if total > self._bench_total_max:
self._bench_total_max = total
# Periodic log (~2s)
import time as _time
now = _time.time()
if DEBUG_LOG and (now - self._bench_last_log >= 2.0) and self._bench_count > 0:
avg_total_ms = 1000.0 * (self._bench_total_sum / self._bench_count)
avg_proc_ms = 1000.0 * (self._bench_proc_sum / self._bench_count)
avg_conv_ms = 1000.0 * (self._bench_conv_sum / self._bench_count)
min_ms = 1000.0 * (self._bench_total_min if self._bench_total_min != float('inf') else 0.0)
max_ms = 1000.0 * self._bench_total_max
# Per-frame timing (µs) for convenience
per_frame_us = (avg_total_ms * 1000.0) / max(1, frame_size)
logger.info(
"ASRC _read bench: calls=%d frame=%d avg=%.3f ms min=%.3f ms max=%.3f ms | proc=%.3f ms conv=%.3f ms | per_frame=%.2f µs",
self._bench_count,
frame_size,
avg_total_ms,
min_ms,
max_ms,
avg_proc_ms,
avg_conv_ms,
per_frame_us,
)
# Reset period accumulators but keep extrema across run
self._bench_count = 0
self._bench_total_sum = 0.0
self._bench_proc_sum = 0.0
self._bench_conv_sum = 0.0
self._bench_last_log = now
except Exception:
# Never let benchmarking break audio path
pass
# Attach frame timestamp to output (as a pseudo-header for tracking)
# We'll store it in a class variable that multicast.py can read
if LATENCY_DEBUG and frame_age_ms > 0:
self._last_frame_age_ms = frame_age_ms
return out
def _close(self) -> None:
try:

View File

@@ -0,0 +1,121 @@
#!/usr/bin/env python3
"""
Test USB audio device latency by measuring time from generating a click
to receiving it back through the microphone (requires physical loopback).
If no physical loopback is available, this measures the baseline capture latency
by timestamping at different points in the audio pipeline.
"""
import sounddevice as sd
import numpy as np
import time
import sys
print("Testing USB Audio Device Latency")
print("=" * 60)
# List devices
print("\nAvailable devices:")
devices = sd.query_devices()
for i, dev in enumerate(devices):
print(f" {i}: {dev['name']}")
if dev['max_input_channels'] > 0:
print(f" Input: {dev['max_input_channels']} ch, latency: {dev['default_low_input_latency']*1000:.1f}ms (low) / {dev['default_high_input_latency']*1000:.1f}ms (high)")
if dev['max_output_channels'] > 0:
print(f" Output: {dev['max_output_channels']} ch, latency: {dev['default_low_output_latency']*1000:.1f}ms (low) / {dev['default_high_output_latency']*1000:.1f}ms (high)")
# Find USB Audio Device
usb_device_idx = None
for i, dev in enumerate(devices):
if 'USB Audio' in dev['name'] and dev['max_input_channels'] > 0:
usb_device_idx = i
break
if usb_device_idx is None:
print("\nError: USB Audio Device not found!")
sys.exit(1)
device_info = devices[usb_device_idx]
print(f"\nTesting device #{usb_device_idx}: {device_info['name']}")
print(f" Default low latency: {device_info['default_low_input_latency']*1000:.1f}ms")
print(f" Default high latency: {device_info['default_high_input_latency']*1000:.1f}ms")
print(f" Default sample rate: {device_info['default_samplerate']} Hz")
# Test capture latency by measuring callback timing
print("\n" + "="*60)
print("Test: Measuring callback timing (10 second capture)")
print("="*60)
capture_times = []
callback_count = 0
def callback(indata, frames, time_info, status):
global callback_count
if status:
print(f"Status: {status}")
# Record timing info
capture_times.append({
'callback_num': callback_count,
'frames': frames,
'input_buffer_adc_time': time_info.inputBufferAdcTime,
'current_time': time_info.currentTime,
'callback_latency': (time_info.currentTime - time_info.inputBufferAdcTime) * 1000, # ms
})
callback_count += 1
samplerate = 48000
blocksize = int(0.002 * samplerate) # 2ms blocks (matches ASRC)
print(f"\nConfig: {samplerate}Hz, blocksize={blocksize} ({blocksize/samplerate*1000:.1f}ms)")
print("\nStarting capture...")
try:
with sd.InputStream(
device=usb_device_idx,
channels=1,
samplerate=samplerate,
blocksize=blocksize,
latency='low',
callback=callback
):
time.sleep(10)
except Exception as e:
print(f"Error: {e}")
sys.exit(1)
print(f"\nCaptured {len(capture_times)} callbacks")
if capture_times:
latencies = [t['callback_latency'] for t in capture_times]
print(f"\nCallback Latency Statistics:")
print(f" Mean: {np.mean(latencies):.2f} ms")
print(f" Median: {np.median(latencies):.2f} ms")
print(f" Min: {np.min(latencies):.2f} ms")
print(f" Max: {np.max(latencies):.2f} ms")
print(f" StdDev: {np.std(latencies):.2f} ms")
print(f"\nFirst 5 callbacks:")
for t in capture_times[:5]:
print(f" #{t['callback_num']}: {t['frames']} frames, latency={t['callback_latency']:.2f}ms")
print(f"\nLast 5 callbacks:")
for t in capture_times[-5:]:
print(f" #{t['callback_num']}: {t['frames']} frames, latency={t['callback_latency']:.2f}ms")
print("\n" + "="*60)
print("INTERPRETATION:")
print("="*60)
print("The 'callback_latency' shows the time between when audio was")
print("captured by the USB device ADC (inputBufferAdcTime) and when")
print("the callback is invoked (currentTime).")
print("")
print("This includes:")
print(" - USB device internal buffering")
print(" - USB bus latency")
print(" - PipeWire/ALSA buffering")
print(" - Driver/kernel scheduling latency")
print("")
print("If this latency is ~50-70ms, it explains your missing gap!")
print("="*60)