Cascaded STST

Demo for cascaded speech-to-speech translation (STST), mapping from source speech in any language to target speech in English. Demo uses OpenAI's Whisper Base model for speech translation, and Microsoft's SpeechT5 TTS model for text-to-speech: