Generative AI with Large Language Models: A Course Review

After diving into the depths of Deep Learning last year, I recently embarked on a new online journey with the “Generative AI with Large Language Models” course offered by Deeplearning.AI on Coursera. At USD 49 with six months to complete it, this course promises a comprehensive introduction to Generative AI, with a focus on computational and cost efficiency.

Unlike the Deep Learning specialization where Andrew Ng was a constant presence, here he only sets the stage each week, leaving the teaching to Antje Barth, Shelbee Eigenbrode, Mike Chambers, and Chris Fregly, with a sprinkle of guest appearances.

The course aims to equip students with an understanding of applying Generative AI in a cost-aware manner. With the spotlight usually on GPT-3 or GPT-4, this class introduces smaller, less resource-intensive models, some even available for free on platforms like Hugging Face. The emphasis is on enhancing these models to cater to specific needs without diving into the nitty-gritty of statistics or coding.

A tech-savvy mindset is essential to grasp the content, as it’s designed for those who’ll be selecting and training models for business objectives. It’s not overly technical, but it might lose those with a purely business-oriented approach.

What’s Covered?

The first week introduces the technology behind Generative AI, focusing on the transformer architecture. It’s a high-level overview, and for a deeper dive, I’d recommend the “Attention Is All You Need” paper and the Sequence Model course from the Deep Learning specialization.

Key terminologies like “The Prompt” and “The Completion” are explained, along with how model parameters influence the output. The concept of “prompt engineering” is introduced, showing how to enhance prompts for better results. The trade-off between compute budget, dataset size, and model size is also discussed, highlighting the benefits of using pre-trained models for developers.

In Week 2, the course shifts its focus to model evaluation and fine-tuning, elucidating the distinctions between Full Fine Tuning and Parameter-efficient Fine-tuning (PEFT), as well as their respective impacts on performance. Full Fine Tuning involves starting with a pre-trained model and updating all of its parameters, which could number in the millions or even hundreds of millions, to boost performance for a specific task. On the other hand, PEFT is a more efficient approach that updates only a smaller subset of parameters. The course delves into various methods for evaluating model performance and compares the two approaches in terms of computational expense versus performance enhancement. A crucial concept discussed is catastrophic forgetting, characterized by a model forgetting previously learned behaviors or answers following fine-tuning, leading to poor overall results beyond a specific task. This phenomenon highlights the complexities of training models and underscores the importance of cost control, as it often requires extensive trial and error to achieve the desired balance between task-specific performance and generalization.

The final week delves into the challenge of aligning model outputs with human expectations, mitigating toxicity, and eliminating social biases. It explores the use of reinforcement learning techniques to ensure models are helpful, honest, and harmless. Moreover, it highlights a somewhat counterintuitive aspect: these models cannot truly reason independently. In essence, Generative AI operates by predicting the most likely next word based on an input and its training. While this can yield impressive results in tasks such as summarization or sentiment analysis, it may falter in simpler mathematical operations or general problem-solving tasks. The course presents strategies to make problems more comprehensible to models, typically by making each step of the reasoning process more explicit.

If you are not convinced about this last point, here is an example using ChatGPT and GPT-4:

Source: ChatGPT 4 (April 11 2024)

The actual result of this division is 1,185,100,187.80, but the model outputs 1,184,791,082.02. It’s close, but the error is around 310,000… so make sure you don’t use ChatGPT for precise computations.

Grading and Lab Work

Each week features a multiple-choice quiz, based on theory, which can be retaken as needed. The online labs, conducted on AWS, involve following a Jupyter notebook step-by-step without any coding requirement. While easy to set up, a comfort level with command lines and executing notebooks is necessary.

Conclusion

In conclusion, the “Generative AI with Large Language Models” course offered by Deeplearning.AI on Coursera provides a valuable and practical introduction to the world of Generative AI. While it may not delve deeply into the technical intricacies, it offers a comprehensive overview of how to apply these models effectively in a cost-aware manner. The course is best suited for tech-savvy individuals who are looking to select and train models for specific business objectives. Through its focus on smaller, less resource-intensive models and the use of pre-trained models, it presents an accessible pathway for developers to integrate Generative AI into their projects. However, it’s important to remember that, as impressive as these models can be, they are not infallible and have limitations, especially in tasks requiring precise computations. Overall, this course is a solid step for anyone looking to understand and harness the potential of Generative AI in their work.