Detectar e gravar áudio em Python

99

Preciso capturar clipes de áudio como arquivos WAV que posso passar para outro bit de python para processamento. O problema é que eu preciso determinar quando há áudio presente e então gravá-lo, parar quando ele ficar em silêncio e, em seguida, passar esse arquivo para o módulo de processamento.

Estou pensando que deveria ser possível com o módulo de onda detectar quando há puro silêncio e descartá-lo, então, assim que algo diferente de silêncio for detectado, inicie a gravação, então quando a linha ficar em silêncio novamente pare a gravação.

Simplesmente não consigo entender, alguém pode me ajudar a começar com um exemplo básico.

python wav audio-recording

— SilentGhost
fonte

106

Seguindo a resposta de Nick Fortescue, aqui está um exemplo mais completo de como gravar do microfone e processar os dados resultantes:

from sys import byteorder
from array import array
from struct import pack

import pyaudio
import wave

THRESHOLD = 500
CHUNK_SIZE = 1024
FORMAT = pyaudio.paInt16
RATE = 44100

def is_silent(snd_data):
    "Returns 'True' if below the 'silent' threshold"
    return max(snd_data) < THRESHOLD

def normalize(snd_data):
    "Average the volume out"
    MAXIMUM = 16384
    times = float(MAXIMUM)/max(abs(i) for i in snd_data)

    r = array('h')
    for i in snd_data:
        r.append(int(i*times))
    return r

def trim(snd_data):
    "Trim the blank spots at the start and end"
    def _trim(snd_data):
        snd_started = False
        r = array('h')

        for i in snd_data:
            if not snd_started and abs(i)>THRESHOLD:
                snd_started = True
                r.append(i)

            elif snd_started:
                r.append(i)
        return r

    # Trim to the left
    snd_data = _trim(snd_data)

    # Trim to the right
    snd_data.reverse()
    snd_data = _trim(snd_data)
    snd_data.reverse()
    return snd_data

def add_silence(snd_data, seconds):
    "Add silence to the start and end of 'snd_data' of length 'seconds' (float)"
    silence = [0] * int(seconds * RATE)
    r = array('h', silence)
    r.extend(snd_data)
    r.extend(silence)
    return r

def record():
    """
    Record a word or words from the microphone and 
    return the data as an array of signed shorts.

    Normalizes the audio, trims silence from the 
    start and end, and pads with 0.5 seconds of 
    blank sound to make sure VLC et al can play 
    it without getting chopped off.
    """
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=1, rate=RATE,
        input=True, output=True,
        frames_per_buffer=CHUNK_SIZE)

    num_silent = 0
    snd_started = False

    r = array('h')

    while 1:
        # little endian, signed short
        snd_data = array('h', stream.read(CHUNK_SIZE))
        if byteorder == 'big':
            snd_data.byteswap()
        r.extend(snd_data)

        silent = is_silent(snd_data)

        if silent and snd_started:
            num_silent += 1
        elif not silent and not snd_started:
            snd_started = True

        if snd_started and num_silent > 30:
            break

    sample_width = p.get_sample_size(FORMAT)
    stream.stop_stream()
    stream.close()
    p.terminate()

    r = normalize(r)
    r = trim(r)
    r = add_silence(r, 0.5)
    return sample_width, r

def record_to_file(path):
    "Records from the microphone and outputs the resulting data to 'path'"
    sample_width, data = record()
    data = pack('<' + ('h'*len(data)), *data)

    wf = wave.open(path, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(sample_width)
    wf.setframerate(RATE)
    wf.writeframes(data)
    wf.close()

if __name__ == '__main__':
    print("please speak a word into the microphone")
    record_to_file('demo.wav')
    print("done - result written to demo.wav")

— crio
fonte

17

Para fazer isso funcionar no Python 3, basta substituir xrange por range.

— Ben Elgar

1

Ótimo exemplo! Realmente útil quando tentei entender como gravar voz usando Python. Uma pergunta rápida que fiz é se existe uma maneira de definir o período de tempo da gravação. Agora ele registra uma palavra? Posso brincar com ele e ter um período de recorde de, por exemplo, 10 segundos? Obrigado!

— Swan87

A detecção e a normalização não são corretas, porque calculam em bytes e não curtos. Esse buffer teria que ser convertido em um array numpy antes do processamento.

— ArekBulski de

Nem xrangenem rangeera realmente necessário em add_silence(por isso agora se foi). Acho que Arek pode estar descobrindo algo aqui - a transição do silêncio para a 'palavra' soa muito irregular. Acho que há outras respostas que abordam isso também.

— Tomasz Gandor

47

Acredito que o módulo WAVE não suporta gravação, apenas processamento de arquivos existentes. Você pode querer dar uma olhada no PyAudio para realmente gravar. WAV é o formato de arquivo mais simples do mundo. No paInt16, você obtém apenas um número inteiro com sinal que representa um nível, e mais perto de 0 é mais silencioso. Não me lembro se os arquivos WAV são de byte alto primeiro ou byte baixo, mas algo assim deve funcionar (desculpe, não sou realmente um programador de python:

from array import array

# you'll probably want to experiment on threshold
# depends how noisy the signal
threshold = 10 
max_value = 0

as_ints = array('h', data)
max_value = max(as_ints)
if max_value > threshold:
    # not silence

Código PyAudio para gravação mantido para referência:

import pyaudio
import sys

chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS, 
                rate=RATE, 
                input=True,
                output=True,
                frames_per_buffer=chunk)

print "* recording"
for i in range(0, 44100 / chunk * RECORD_SECONDS):
    data = stream.read(chunk)
    # check for silence here by comparing the level with 0 (or some threshold) for 
    # the contents of data.
    # then write data or not to a file

print "* done"

stream.stop_stream()
stream.close()
p.terminate()

— Nick Fortescue
fonte

Obrigado Nick, Sim, deveria ter dito que também estou usando portaudio para a captura, o ponto em que estou preso é a verificação de silêncio, como faço para obter o nível no pedaço de dados?

Eu adicionei um código realmente simples não testado acima, mas ele deve fazer o trabalho que você deseja

— Nick Fortescue

Minha versão anterior tinha um bug, não estava lidando com o sinal corretamente. Usei a função de biblioteca array () para analisar corretamente agora

— Nick Fortescue

O formato de arquivo WAV é um contêiner, ele pode conter áudio codificado por vários codecs (como GSM ou MP3), alguns longe de serem os 'mais simples do mundo'.

— Jacek Konieczny

2

Acredito que a opção "output = True" ao abrir o stream não seja necessária para a gravação e, além disso, parece causar "IOError: [Errno Input overflowed] -9981" no meu dispositivo. Caso contrário, obrigado pelo exemplo de código, foi muito útil.

— Binus

19

Obrigado ao cryo pela versão melhorada que baseei meu código testado abaixo:

#Instead of adding silence at start and end of recording (values=0) I add the original audio . This makes audio sound more natural as volume is >0. See trim()
#I also fixed issue with the previous code - accumulated silence counter needs to be cleared once recording is resumed.

from array import array
from struct import pack
from sys import byteorder
import copy
import pyaudio
import wave

THRESHOLD = 500  # audio levels not normalised.
CHUNK_SIZE = 1024
SILENT_CHUNKS = 3 * 44100 / 1024  # about 3sec
FORMAT = pyaudio.paInt16
FRAME_MAX_VALUE = 2 ** 15 - 1
NORMALIZE_MINUS_ONE_dB = 10 ** (-1.0 / 20)
RATE = 44100
CHANNELS = 1
TRIM_APPEND = RATE / 4

def is_silent(data_chunk):
    """Returns 'True' if below the 'silent' threshold"""
    return max(data_chunk) < THRESHOLD

def normalize(data_all):
    """Amplify the volume out to max -1dB"""
    # MAXIMUM = 16384
    normalize_factor = (float(NORMALIZE_MINUS_ONE_dB * FRAME_MAX_VALUE)
                        / max(abs(i) for i in data_all))

    r = array('h')
    for i in data_all:
        r.append(int(i * normalize_factor))
    return r

def trim(data_all):
    _from = 0
    _to = len(data_all) - 1
    for i, b in enumerate(data_all):
        if abs(b) > THRESHOLD:
            _from = max(0, i - TRIM_APPEND)
            break

    for i, b in enumerate(reversed(data_all)):
        if abs(b) > THRESHOLD:
            _to = min(len(data_all) - 1, len(data_all) - 1 - i + TRIM_APPEND)
            break

    return copy.deepcopy(data_all[_from:(_to + 1)])

def record():
    """Record a word or words from the microphone and 
    return the data as an array of signed shorts."""

    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, output=True, frames_per_buffer=CHUNK_SIZE)

    silent_chunks = 0
    audio_started = False
    data_all = array('h')

    while True:
        # little endian, signed short
        data_chunk = array('h', stream.read(CHUNK_SIZE))
        if byteorder == 'big':
            data_chunk.byteswap()
        data_all.extend(data_chunk)

        silent = is_silent(data_chunk)

        if audio_started:
            if silent:
                silent_chunks += 1
                if silent_chunks > SILENT_CHUNKS:
                    break
            else: 
                silent_chunks = 0
        elif not silent:
            audio_started = True              

    sample_width = p.get_sample_size(FORMAT)
    stream.stop_stream()
    stream.close()
    p.terminate()

    data_all = trim(data_all)  # we trim before normalize as threshhold applies to un-normalized wave (as well as is_silent() function)
    data_all = normalize(data_all)
    return sample_width, data_all

def record_to_file(path):
    "Records from the microphone and outputs the resulting data to 'path'"
    sample_width, data = record()
    data = pack('<' + ('h' * len(data)), *data)

    wave_file = wave.open(path, 'wb')
    wave_file.setnchannels(CHANNELS)
    wave_file.setsampwidth(sample_width)
    wave_file.setframerate(RATE)
    wave_file.writeframes(data)
    wave_file.close()

if __name__ == '__main__':
    print("Wait in silence to begin recording; wait in silence to terminate")
    record_to_file('demo.wav')
    print("done - result written to demo.wav")

— Eugene
fonte

obrigado, funciona muito bem. No meu caso, tenho que editar return copy.deepcopy(data_all[_from:(_to + 1)])paracopy.deepcopy(data_all[int(_from):(int(_to) + 1)])

— lukassliacky

6

import pyaudio
import wave
from array import array

FORMAT=pyaudio.paInt16
CHANNELS=2
RATE=44100
CHUNK=1024
RECORD_SECONDS=15
FILE_NAME="RECORDING.wav"

audio=pyaudio.PyAudio() #instantiate the pyaudio

#recording prerequisites
stream=audio.open(format=FORMAT,channels=CHANNELS, 
                  rate=RATE,
                  input=True,
                  frames_per_buffer=CHUNK)

#starting recording
frames=[]

for i in range(0,int(RATE/CHUNK*RECORD_SECONDS)):
    data=stream.read(CHUNK)
    data_chunk=array('h',data)
    vol=max(data_chunk)
    if(vol>=500):
        print("something said")
        frames.append(data)
    else:
        print("nothing")
    print("\n")


#end of recording
stream.stop_stream()
stream.close()
audio.terminate()
#writing to file
wavfile=wave.open(FILE_NAME,'wb')
wavfile.setnchannels(CHANNELS)
wavfile.setsampwidth(audio.get_sample_size(FORMAT))
wavfile.setframerate(RATE)
wavfile.writeframes(b''.join(frames))#append frames recorded to file
wavfile.close()

Acho que isso vai ajudar.É um script simples que vai verificar se há silêncio ou não.Se o silêncio for detectado não irá gravar, caso contrário irá gravar.

— deenu
fonte

3

O site pyaudio tem muitos exemplos que são bastante curtos e claros: http://people.csail.mit.edu/hubert/pyaudio/

Atualização de 14 de dezembro de 2019 - principal exemplo do site vinculado acima de 2017:


"""PyAudio Example: Play a WAVE file."""

import pyaudio
import wave
import sys

CHUNK = 1024

if len(sys.argv) < 2:
    print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
    sys.exit(-1)

wf = wave.open(sys.argv[1], 'rb')

p = pyaudio.PyAudio()

stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                channels=wf.getnchannels(),
                rate=wf.getframerate(),
                output=True)

data = wf.readframes(CHUNK)

while data != '':
    stream.write(data)
    data = wf.readframes(CHUNK)

stream.stop_stream()
stream.close()

p.terminate()

— simjega
fonte

0

Você pode querer dar uma olhada nos csounds também. Possui várias APIs, incluindo Python. Pode ser capaz de interagir com uma interface AD e coletar amostras de som.

— S.Lott
fonte