Getting Started¶

Installation¶

pip install aic-sdk

For examples:

pip install -r examples/requirements.txt

License Key¶

Set the environment variable (or use a .env file), and pass it to the model.

export AIC_SDK_LICENSE="your_license_key"

or in .env:

AIC_SDK_LICENSE=your_license_key

First Enhancement¶

import os
import numpy as np
from dotenv import load_dotenv
from aic import Model, AICModelType, AICParameter

load_dotenv()
license_key = os.getenv("AIC_SDK_LICENSE")

with Model(AICModelType.QUAIL_L, license_key=license_key, sample_rate=48000, channels=1, frames=480) as model:
    model.set_parameter(AICParameter.ENHANCEMENT_LEVEL, 0.7)

    # Planar format: (channels, frames) - default processing method
    audio = np.random.randn(1, 480).astype(np.float32)
    enhanced = model.process(audio)

Processing Methods¶

The SDK supports three audio layouts:

Planar (default): Separate buffer per channel (channels, frames)

audio = np.random.randn(2, 480).astype(np.float32)  # 2 channels, 480 frames
enhanced = model.process(audio)

Interleaved: Channels interleaved in single buffer (frames * channels,)

audio = np.random.randn(960).astype(np.float32)  # 2 channels * 480 frames
enhanced = model.process_interleaved(audio, channels=2)

Sequential: All samples for channel 0, then channel 1, etc. (frames * channels,)

ch0 = np.random.randn(480).astype(np.float32)
ch1 = np.random.randn(480).astype(np.float32)
audio = np.concatenate([ch0, ch1])  # Sequential layout
enhanced = model.process_sequential(audio, channels=2)

Use optimal_num_frames() to get a recommended buffer size for streaming.
Use optimal_sample_rate() for the preferred I/O sample rate.

Model Types¶

The SDK provides several model families optimized for different use cases:

QUAIL_L / QUAIL_S: General-purpose enhancement models (auto-selects sample rate variant)
QUAIL_XS / QUAIL_XXS: Lower latency models for real-time applications
QUAIL_STT_*: Models optimized for speech-to-text applications:
QUAIL_STT_L16 - 16 kHz, large variant
QUAIL_STT_L8 - 8 kHz, large variant
QUAIL_STT_S16 - 16 kHz, small variant
QUAIL_STT_S8 - 8 kHz, small variant
QUAIL_VF_STT_L16 - Voice Focus model for isolating foreground speaker

Note: QUAIL_STT is deprecated; use QUAIL_STT_L16 instead.

Voice Activity Detection (VAD)¶

Create a VAD tied to your model to get speech activity predictions with minimal effort.

from aic import Model, AICModelType, AICVadParameter

with Model(AICModelType.QUAIL_L, license_key=license_key, sample_rate=48000, channels=1, frames=480) as model:
    with model.create_vad() as vad:
        vad.set_parameter(AICVadParameter.SPEECH_HOLD_DURATION, 0.05)
        vad.set_parameter(AICVadParameter.SENSITIVITY, 6.0)

        # Drive the model to produce VAD predictions
        audio = np.random.randn(1, 480).astype(np.float32)
        model.process(audio)
        print("speech detected:", vad.is_speech_detected())