LLMs Deep Dive
Hands-on projects in PyTorch
Organization
The University of Chicago
Core Technologies
Python PyTorch HuggingFace TorchText spaCy
Domain
Natural Language Processing
Date
January 2023
Technical highlight
LSTM Transducer-Based Language Model
In this project, I built a language model using an LSTM (Long Short-Term Memory) transducer, a type of Recurrent Neural Network (RNN) designed to process sequences of data, like text, while remembering important information from earlier in the sequence. This is crucial in language processing, where understanding context and relationships between words is key. The model was trained using a technique called early stopping, which helps prevent overfitting, and its effectiveness was measured through test perplexity — a metric that shows how well the model predicts the next word in a sentence.
I then used the model to generate random text and tested its ability to recognize sentences that make sense by assigning lower perplexities (higher probabilities) to coherent sentences.
To improve the model's performance, I implemented several extensions:
1. GRU replacement: I swapped the LSTM with a simpler, faster model known as GRU (Gated Recurrent Unit) to compare efficiency and results.
2. Deeper network: I increased the number of LSTM layers to allow the model to capture more complex patterns in language.
3. Dropout technique: To prevent the model from overfitting, I added dropout, which randomly removes connections during training.
4. Gradient clipping: This technique was applied to stabilize the training process, ensuring the model doesn't become unstable during learning.
Technical highlight
Implementing the Transformer for Language Translation
In this project, I implemented the Transformer architecture, a state-of-the-art model for language translation introduced in the groundbreaking Attention Is All You Need paper. The Transformer leverages self-attention mechanisms, which allow the model to weigh the importance of different words in a sentence, making it more effective at capturing long-range dependencies compared to traditional RNNs.
I applied this model to translate French sentences into English, training it on parallel corpora of French-English sentence pairs. The self-attention mechanism enabled the model to attend to specific parts of the input sentence at each decoding step, leading to more accurate translations.
By eliminating the sequential processing required by RNNs, the Transformer model allowed for more efficient parallelization and resulted in a loss of an impressive 0.18 after training for just three epochs.
Technical highlight
Text Summarization Pipeline Using HuggingFace Transformers
In this project, I implemented a text-summarization pipeline on the RCC cluster at the University of Chicago using the HuggingFace Transformers library. I fine-tuned the model on the CNN/DailyMail dataset (v1.0.0), selecting 10,000 examples for training and 2,000 each for validation and testing.
A key aspect was generating zero-shot summaries with pre-trained models. I incorporated helpful prompts during tokenization to boost model performance and wrote code to instantiate the trainer class for fine-tuning.
After fine-tuning, I compared the quality of summaries from the original and fine-tuned models using both qualitative methods and the ROUGE metric for quantitative evaluation. Additionally, I explored alternative generation mechanisms like beam search, greedy decoding, and top-k sampling to optimize summary generation.
Takeaways