Level: Intermediate to Advanced
Prerequisite: See below
Large language models (LLMs) are trained on vast quantities of textual data, enabling them to perform a broad array of natural language processing (NLP) tasks, such as summarization, question answering, paraphrasing, and numerous others, with exceptional precision and effectiveness. Tremendous media attention has focused on these capabilities. How can a business with vast proprietary data leverage these models for highly precise and domain-specific answers? In this course, students will explore a variety of solutions using Amazon Sagemaker.
Instructor Krish Krishnan will guide students through four approaches to training LLMs on enterprise data. Students will set up the AWS platform with LLM Hub. Using OpenSearch data and Amazon Sagemaker, students will gain hands-on experience with each approach:
Prompt-based learning involves fine-tuning a LLM using factual knowledge that is represented as question-answer (prompt completion) pairs. This fine-tuning process is supervised and normally involves updating the model through gradient descent. This approach does not require a large amount of data and is run for a small number of epochs.
Domain adaptation modifies the LLM to align with the enterprise domain, producing responses that are domain-centric. The original base model is further trained in a self-supervised manner with domain-specific unlabeled data to update the model through gradient descent. This approach usually requires a larger amount of data, a custom vocabulary, and tokenizer.
Augmentation supplements the base LLM with external custom domain knowledge through information retrieval (IR). In this approach, a knowledge base containing domain-specific documents is used together with an IR mechanism that retrieves relevant pieces of information such as passages or sections, referred to as "context."
Vector search is an advanced approach in which textual data is transformed into semantically rich, contextualized embeddings via a text embedding model, enabling efficient and accurate information retrieval.
After this workshop, you will be ready to apply these skills—in your own business, with your own data, and on your own platforms—to bring the benefits of LLMs to your enterprise.
You Will Learn
- Fundamentals of Amazon Sagemaker
- LLM refresher and AWS Hub
- Amazon Sagemaker endpoint configuration
- OpenSearch data sets for LLM
- Advanced setup and configuration for AWS and Sagemaker
- LLM fine-tuning
- Domain-centric LLM deployment
- Context-driven LLM
- Embeddings and Vector DB for deploying LLM
- Exploring outcomes
- Alternative choices
Geared To
- ML engineers
- Data scientists
- Data and analytics developers
- Data engineers
- Architects
Prerequisites
- Basic understanding of Python, SQL
- AWS product familiarity
Laptop Setup
Students must bring their laptops to class.
Machine Requirements:
- Windows or Max OS X
- 64-bit operating system
- 8 GB available RAM, 16 GB preferred
Setup:
Laptop setup is required BEFORE the conference. Instructions will be emailed to registrants before the event.
There is no time allotted in class for laptop preparation.
* Enrollment is limited to 40 attendees.