chore: initial snapshot for gitea/github upload
This commit is contained in:
@@ -0,0 +1,159 @@
|
||||
# OpenAI Audio Transcription Guardrail Translation Handler
|
||||
|
||||
Handler for processing OpenAI's audio transcription endpoint (`/v1/audio/transcriptions`) with guardrails.
|
||||
|
||||
## Overview
|
||||
|
||||
This handler processes audio transcription responses by:
|
||||
1. Applying guardrails to the transcribed text output
|
||||
2. Returning the input unchanged (since input is an audio file, not text)
|
||||
|
||||
## Data Format
|
||||
|
||||
### Input Format
|
||||
|
||||
The input is an audio file, which cannot be guardrailed (it's binary data, not text).
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "whisper-1",
|
||||
"file": "<audio file>",
|
||||
"response_format": "json",
|
||||
"language": "en"
|
||||
}
|
||||
```
|
||||
|
||||
### Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"text": "This is the transcribed text from the audio file."
|
||||
}
|
||||
```
|
||||
|
||||
Or with additional metadata:
|
||||
|
||||
```json
|
||||
{
|
||||
"text": "This is the transcribed text from the audio file.",
|
||||
"duration": 3.5,
|
||||
"language": "en"
|
||||
}
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
The handler is automatically discovered and applied when guardrails are used with the audio transcription endpoint.
|
||||
|
||||
### Example: Using Guardrails with Audio Transcription
|
||||
|
||||
```bash
|
||||
curl -X POST 'http://localhost:4000/v1/audio/transcriptions' \
|
||||
-H 'Authorization: Bearer your-api-key' \
|
||||
-F 'file=@audio.mp3' \
|
||||
-F 'model=whisper-1' \
|
||||
-F 'guardrails=["pii_mask"]'
|
||||
```
|
||||
|
||||
The guardrail will be applied to the **output** transcribed text only.
|
||||
|
||||
### Example: PII Masking in Transcribed Text
|
||||
|
||||
```bash
|
||||
curl -X POST 'http://localhost:4000/v1/audio/transcriptions' \
|
||||
-H 'Authorization: Bearer your-api-key' \
|
||||
-F 'file=@meeting_recording.mp3' \
|
||||
-F 'model=whisper-1' \
|
||||
-F 'guardrails=["mask_pii"]' \
|
||||
-F 'response_format=json'
|
||||
```
|
||||
|
||||
If the audio contains: "My name is John Doe and my email is john@example.com"
|
||||
|
||||
The transcription output will be: "My name is [NAME_REDACTED] and my email is [EMAIL_REDACTED]"
|
||||
|
||||
### Example: Content Moderation on Transcriptions
|
||||
|
||||
```bash
|
||||
curl -X POST 'http://localhost:4000/v1/audio/transcriptions' \
|
||||
-H 'Authorization: Bearer your-api-key' \
|
||||
-F 'file=@audio.wav' \
|
||||
-F 'model=whisper-1' \
|
||||
-F 'guardrails=["content_moderation"]'
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Input Processing
|
||||
|
||||
- **Status**: Not applicable
|
||||
- **Reason**: Input is an audio file (binary data), not text
|
||||
- **Result**: Request data returned unchanged
|
||||
|
||||
### Output Processing
|
||||
|
||||
- **Field**: `text` (string)
|
||||
- **Processing**: Applies guardrail to the transcribed text
|
||||
- **Result**: Updated text in response
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. **PII Protection**: Automatically redact personally identifiable information from transcriptions
|
||||
2. **Content Filtering**: Remove or flag inappropriate content in transcribed audio
|
||||
3. **Compliance**: Ensure transcriptions meet regulatory requirements
|
||||
4. **Data Sanitization**: Clean up transcriptions before storage or further processing
|
||||
|
||||
## Extension
|
||||
|
||||
Override these methods to customize behavior:
|
||||
|
||||
- `process_output_response()`: Customize how transcribed text is processed
|
||||
- `process_input_messages()`: Currently a no-op, but can be overridden if needed
|
||||
|
||||
## Supported Call Types
|
||||
|
||||
- `CallTypes.transcription` - Synchronous audio transcription
|
||||
- `CallTypes.atranscription` - Asynchronous audio transcription
|
||||
|
||||
## Notes
|
||||
|
||||
- Input processing is a no-op since audio files cannot be text-guardrailed
|
||||
- Only the transcribed text output is processed
|
||||
- Guardrails apply after transcription is complete
|
||||
- Both sync and async call types use the same handler
|
||||
- Works with all Whisper models and response formats
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Transcribe and Redact PII
|
||||
|
||||
```python
|
||||
import litellm
|
||||
|
||||
response = litellm.transcription(
|
||||
model="whisper-1",
|
||||
file=open("interview.mp3", "rb"),
|
||||
guardrails=["mask_pii"],
|
||||
)
|
||||
|
||||
# response.text will have PII redacted
|
||||
print(response.text)
|
||||
```
|
||||
|
||||
### Async Transcription with Guardrails
|
||||
|
||||
```python
|
||||
import litellm
|
||||
import asyncio
|
||||
|
||||
async def transcribe_with_guardrails():
|
||||
response = await litellm.atranscription(
|
||||
model="whisper-1",
|
||||
file=open("audio.mp3", "rb"),
|
||||
guardrails=["content_filter"],
|
||||
)
|
||||
return response.text
|
||||
|
||||
text = asyncio.run(transcribe_with_guardrails())
|
||||
```
|
||||
|
||||
@@ -0,0 +1,13 @@
|
||||
"""OpenAI Audio Transcription handler for Unified Guardrails."""
|
||||
|
||||
from litellm.llms.openai.transcriptions.guardrail_translation.handler import (
|
||||
OpenAIAudioTranscriptionHandler,
|
||||
)
|
||||
from litellm.types.utils import CallTypes
|
||||
|
||||
guardrail_translation_mappings = {
|
||||
CallTypes.transcription: OpenAIAudioTranscriptionHandler,
|
||||
CallTypes.atranscription: OpenAIAudioTranscriptionHandler,
|
||||
}
|
||||
|
||||
__all__ = ["guardrail_translation_mappings", "OpenAIAudioTranscriptionHandler"]
|
||||
@@ -0,0 +1,117 @@
|
||||
"""
|
||||
OpenAI Audio Transcription Handler for Unified Guardrails
|
||||
|
||||
This module provides guardrail translation support for OpenAI's audio transcription endpoint.
|
||||
The handler processes the output transcribed text (input is audio, so no text to guardrail).
|
||||
"""
|
||||
|
||||
from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
from litellm._logging import verbose_proxy_logger
|
||||
from litellm.llms.base_llm.guardrail_translation.base_translation import BaseTranslation
|
||||
from litellm.types.utils import GenericGuardrailAPIInputs
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from litellm.integrations.custom_guardrail import CustomGuardrail
|
||||
from litellm.utils import TranscriptionResponse
|
||||
|
||||
|
||||
class OpenAIAudioTranscriptionHandler(BaseTranslation):
|
||||
"""
|
||||
Handler for processing OpenAI audio transcription responses with guardrails.
|
||||
|
||||
This class provides methods to:
|
||||
1. Process output transcription text (post-call hook)
|
||||
|
||||
Note: Input processing is not applicable since the input is an audio file,
|
||||
not text. Only the transcribed text output is processed.
|
||||
"""
|
||||
|
||||
async def process_input_messages(
|
||||
self,
|
||||
data: dict,
|
||||
guardrail_to_apply: "CustomGuardrail",
|
||||
litellm_logging_obj: Optional[Any] = None,
|
||||
) -> Any:
|
||||
"""
|
||||
Process input - not applicable for audio transcription.
|
||||
|
||||
The input is an audio file, not text, so there's nothing to apply
|
||||
guardrails to. This method returns the data unchanged.
|
||||
|
||||
Args:
|
||||
data: Request data dictionary containing audio file
|
||||
guardrail_to_apply: The guardrail instance (unused)
|
||||
|
||||
Returns:
|
||||
Unmodified data (audio files don't need text guardrails)
|
||||
"""
|
||||
verbose_proxy_logger.debug(
|
||||
"OpenAI Audio Transcription: Input processing not applicable "
|
||||
"(input is audio file, not text)"
|
||||
)
|
||||
return data
|
||||
|
||||
async def process_output_response(
|
||||
self,
|
||||
response: "TranscriptionResponse",
|
||||
guardrail_to_apply: "CustomGuardrail",
|
||||
litellm_logging_obj: Optional[Any] = None,
|
||||
user_api_key_dict: Optional[Any] = None,
|
||||
) -> Any:
|
||||
"""
|
||||
Process output transcription by applying guardrails to transcribed text.
|
||||
|
||||
Args:
|
||||
response: Transcription response object containing transcribed text
|
||||
guardrail_to_apply: The guardrail instance to apply
|
||||
litellm_logging_obj: Optional logging object
|
||||
user_api_key_dict: User API key metadata to pass to guardrails
|
||||
|
||||
Returns:
|
||||
Modified response with guardrails applied to transcribed text
|
||||
"""
|
||||
if not hasattr(response, "text") or response.text is None:
|
||||
verbose_proxy_logger.debug(
|
||||
"OpenAI Audio Transcription: No text in response to process"
|
||||
)
|
||||
return response
|
||||
|
||||
if isinstance(response.text, str):
|
||||
original_text = response.text
|
||||
# Create a request_data dict with response info and user API key metadata
|
||||
request_data: dict = {"response": response}
|
||||
|
||||
# Add user API key metadata with prefixed keys
|
||||
user_metadata = self.transform_user_api_key_dict_to_metadata(
|
||||
user_api_key_dict
|
||||
)
|
||||
if user_metadata:
|
||||
request_data["litellm_metadata"] = user_metadata
|
||||
|
||||
inputs = GenericGuardrailAPIInputs(texts=[original_text])
|
||||
# Include model information from the response if available
|
||||
if hasattr(response, "model") and response.model:
|
||||
inputs["model"] = response.model
|
||||
guardrailed_inputs = await guardrail_to_apply.apply_guardrail(
|
||||
inputs=inputs,
|
||||
request_data=request_data,
|
||||
input_type="response",
|
||||
logging_obj=litellm_logging_obj,
|
||||
)
|
||||
guardrailed_texts = guardrailed_inputs.get("texts", [])
|
||||
response.text = guardrailed_texts[0] if guardrailed_texts else original_text
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
"OpenAI Audio Transcription: Applied guardrail to transcribed text. "
|
||||
"Original length: %d, New length: %d",
|
||||
len(original_text),
|
||||
len(response.text),
|
||||
)
|
||||
else:
|
||||
verbose_proxy_logger.debug(
|
||||
"OpenAI Audio Transcription: Unexpected text type: %s. Expected string.",
|
||||
type(response.text),
|
||||
)
|
||||
|
||||
return response
|
||||
Reference in New Issue
Block a user