Why, How, and Where to Deploy GenAI and Other SOTA AI to Edge Devices

You must be logged in to watch this session

First Name*

Last Name*

Job Title

Company Name

Country of Residence*

Work E-mail Address*

Password*

Confirm Password*

Your personal data will be used to support your experience throughout this website, to manage access to your account, and for other purposes described in our privacy policy.

Sense Media, on behalf of AutoSens, needs the contact information you provide to us to update you with information about AutoSens and our products. You may unsubscribe from these communications at anytime. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, check out our privacy policy.

Only fill in if you are not human

Why, How, and Where to Deploy GenAI and Other SOTA AI to Edge Devices

Event: InCabin USA

| Session date: Wednesday 11th June

Session date: Wednesday 11th June

, 2025

Hear from:

Peter Kristiansen

Head of Business Development,

Embedl

Peter Kristiansen

Head of Business Development,

Embedl

Generative AI, particularly Large Language Models (LLMs) and Transformers, has
become a cornerstone of modern applications, powering advancements in natural language
processing, autonomous systems, and real-time decision-making. However, the traditional
cloud-based inference approach poses significant challenges, including privacy concerns,
connectivity constraints, and high operational costs. These issues drive the need to deploy
AI models directly on edge devices, such as vehicles and embedded systems, where
real-time, on-device processing is essential.
This talk explores the latest strategies for deploying LLMs and Transformers in
resource-constrained edge environments while maintaining performance, efficiency, and
reliability. We examine the current generation of system-on-chips (SoCs) from industry
leaders such as Texas Instruments, Nvidia, Qualcomm, and Ambarella, focusing on their
capabilities for executing large AI models efficiently. We will also discuss the trade-offs
involved in model compression techniques, including quantization, pruning, and knowledge
distillation. While these techniques enable deployment on limited hardware, they may
compromise model fidelity, leading to performance degradation in real-world applications
despite favourable benchmark results.