London-based AI startup KAIKAKU.AI has unveiled Epicure, a family of three highly efficient AI models trained on an extensive dataset of 4.14 million recipes spanning seven languages. Remarkably, these models represent a compressed form of culinary knowledge, fitting into a mere 2 megabytes—a footprint smaller than a standard audio file. This innovation offers a novel way to interact with food information, transforming recipes into a navigable mathematical space.
Key Takeaways
- KAIKAKU.AI has developed Epicure, a suite of three AI models trained on 4.14 million multilingual recipes.
- The models do not store recipes but rather encode learned relationships between 1,790 ingredients, described by 300 numerical dimensions.
- Three variants—Cooc, Chem, and Core—cater to different analytical needs, focusing on ingredient co-occurrence, flavor chemistry, or a blend of both.
- The compact size (2MB) and specialized nature of Epicure offer advantages in reliability and efficiency over general-purpose AI models for food-related tasks.
The core concept behind Epicure, as detailed in a paper published on arXiv by KAIKAKU.AI CEO Josef Chen and researcher Jakub Radzikowski, is to represent culinary knowledge not as discrete recipes but as a multidimensional map of ingredients. Each of the 1,790 recognized ingredients is defined by a vector of 300 numbers. This structure allows for mathematical manipulation of ingredient relationships, enabling queries about flavor profiles, substitutions, and culinary pairings across different cuisines.
Chen shared on X that this vast culinary dataset, encompassing 4.1 million recipes in seven languages, has been distilled into just 2 megabytes of data. This is achieved by focusing on the learned relationships between ingredients rather than storing the recipes themselves. The size is calculated by multiplying the number of ingredients (1,790) by the dimensionality of each ingredient’s representation (300) and the storage required per dimension (approximately 4 bytes), resulting in roughly 2.05 megabytes.
Launching our new paper on arXiv: we trained the largest multilingual food model ever built.
4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions.
All of human cooking compressed into 2 megabytes.
![]()
pic.twitter.com/b4GiZ62UMt
— Josef Chen (@josefchen) May 26, 2026
This approach draws parallels with advancements in natural language processing, such as Google’s word2vec, which demonstrated that words could be represented numerically in a way that captures semantic relationships, allowing for arithmetic operations on word meanings. Epicure applies a similar principle to ingredients, enabling users to navigate culinary concepts mathematically. For example, by mathematically “steering” the ingredient vector for beef towards an “American” context, the model might suggest pairings like bread, lettuce, or beer. Conversely, pointing it towards “Southeast Asia” would bring up ingredients like soy sauce, ginger, and sesame oil.
The paper introduces a technique referred to as SLERP rotation, a mathematical operation that allows for the exploration of ingredient relationships. By rotating an ingredient’s vector towards a specific culinary direction, users can discover new pairings or identify ingredients that share similar characteristics within that context. For instance, starting with chicken and rotating it towards a “Tex-Mex” direction might reveal common pairings in that cuisine.
Epicure is available in three distinct variants designed for specific applications. Cooc leverages recipe co-occurrence data, identifying ingredients that are frequently found together in actual dishes. Chem focuses on flavor chemistry, analyzing shared aroma compounds using data from the FlavorDB chemical database. The Core model combines insights from both Cooc and Chem. This differentiation allows users to select the model best suited to their needs, whether they are seeking common culinary partners (Cooc) or ingredients with similar chemical flavor profiles (Chem).
For example, asking Cooc what pairs with chocolate might yield common dessert ingredients like cocoa powder, vanilla, or almonds. In contrast, querying Chem with chocolate could return ingredients like toffee, fudge, or ganache, which share underlying flavor compounds. This distinction is crucial for chefs requiring different information, such as finding ingredient substitutes versus mapping flavor compatibility.
Epicure distinguishes itself from general-purpose AI models like ChatGPT for food applications by its focused scope and inherent reliability. It possesses no general knowledge outside its defined ingredient space of 1,790 items and cannot hallucinate information. This specificity ensures that it provides accurate, data-driven insights within its domain, avoiding the potential for erroneous suggestions that can occur with larger, more generalized models.
The previous state-of-the-art model in this area, FlavorGraph (2021), was limited to English recipes and chemical data. Epicure surpasses this by incorporating a significantly larger, multilingual dataset and optimizing vocabulary for greater efficiency. Its practical applications are numerous, ranging from assisting chefs in identifying ingredient equivalents across different cuisines to aiding food product developers in finding compatible flavor substitutes. The ability of specialized, smaller models to outperform large generalist models in specific tasks is a significant trend in AI development.
KAIKAKU.AI has made the Epicure models publicly available on Hugging Face, alongside an interactive ingredient map accessible at epicure.kaikaku.ai. While the research paper and trained models are released, the full training code has not yet been made available.
Long-Term Technological Impact
The development of Epicure signifies a critical advancement in the application of specialized AI within niche domains. By demonstrating the power of highly compressed, yet functionally rich, AI models, KAIKAKU.AI is paving the way for more accessible and efficient AI solutions across various industries. This approach challenges the paradigm of relying on massive, general-purpose models for every task. For the blockchain and Web3 space, this could translate into more lightweight and decentralized AI agents capable of performing complex analytical tasks without requiring immense computational resources. The focus on structured data representation and mathematical manipulation of knowledge aligns with principles of efficient data encoding and verifiable computation, which are foundational to blockchain scalability and AI integration in decentralized applications. Furthermore, the development of multilingual, domain-specific models like Epicure could empower global Web3 ecosystems by providing localized AI functionalities, enhancing user experience and fostering broader adoption of blockchain technologies.
Based on materials from : decrypt.co
