TECHNOLOGY

Cohere's New AI Models: Bridging the Language Gap

San Francisco, USASat Oct 26 2024
Cohere has just unveiled two exciting AI models, Aya Expanse 8B and 35B, under its ambitious Aya project. The goal? To close the language gap in AI technology. These models, now available on Hugging Face, support 23 different languages. Cohere's blog post explains that the 35B parameter model offers top-notch multilingual capabilities, while the 8B model makes cutting-edge AI more accessible to researchers globally. The Aya project, launched by Cohere for AI last year, aims to extend the reach of foundation models beyond English. Earlier this year, they released the Aya 101 large language model, covering an impressive 101 languages. Alongside this, they shared the Aya dataset to help train models in other languages. Aya Expanse uses similar techniques to Aya 101. Cohere has been dedicated to bridging the language gap, focusing on data arbitrage, preference training for performance and safety, and model merging. The new models have outperformed similar-sized models from Google, Mistral, and Meta in benchmark tests. Cohere's approach to avoiding gibberish in AI-generated text is notable. They use data arbitrage to prevent reliance on synthetic data, which can be problematic for low-resource languages. Additionally, they've figured out how to guide models toward global preferences and account for cultural and linguistic diversity, enhancing performance and safety. The challenge with non-English language models is finding sufficient data for training. English dominates in government, finance, and business, making it easier to gather data. Benchmarking models across different languages is also tricky due to translation quality. Other developers have released datasets to support this research, like OpenAI's dataset covering 14 languages. Cohere has been active lately, adding image search to Embed 3 and enhancing fine-tuning for its Command R model.

questions

    In what ways do the accessibility and performance claims of Cohere’s new models address the systemic challenges of language diversity in AI development?
    Are there hidden agendas behind Cohere's preference training for 'global preferences'?
    How might the performance metrics used to benchmark these AI models be biased, and what steps are taken to mitigate these biases?

actions