Large language models (LLMs) have revolutionised natural language processing, yet their performance in underrepresented languages and cultural contexts remains severely limited. This paper presents atlasflux-qwen-7b-1.0, a fine-tuned variant of the Qwen2.5-7B model adapted specifically for Malaysian Bahasa Melayu, colloquial slang (Manglish), and local cultural knowledge.
The model was fine-tuned using LoRA (Low-Rank Adaptation) on a custom-built dataset of 2,968 instruction-response pairs. We discuss the training methodology, dataset construction, evaluation strategy, deployment challenges, and cost-effective inference solutions. The model is available under the Apache 2.0 license.
1. Introduction
1.1 The Problem
The rapid advancement of LLMs has largely favoured high-resource languages, leaving linguistically diverse regions underserved. Malaysia, with its rich tapestry of Bahasa Melayu, Manglish, and numerous regional dialects, faces a significant gap in AI models that truly understand local communication nuances.
1.3 Project Goals
- To fine-tune an open-source 7B-parameter LLM (Qwen2.5-7B) on a curated, high-quality dataset of Malaysian-centric examples.
- To minimise computational cost by using LoRA and 4-bit quantisation, enabling training on a single Google Colab T4 GPU.
- To release the fine-tuned model under a permissive license (Apache 2.0) to encourage commercial adoption.
2. Model Architecture
2.1 Base Model Specifications
3. Dataset Construction
3.1 Design Philosophy
We constructed a custom dataset comprising 2,968 instruction-response pairs in JSONL format, carefully curated to reflect three categories:
4. Fine-Tuning Methodology
4.1 Parameter-Efficient Fine-Tuning (PEFT) with LoRA
To minimise computational cost, we employed LoRA. Only 0.26% of the total parameters (approx. 20 million out of 7.6 billion) were trained.
4.2 Training Setup
5. Evaluation
5.1 Qualitative Evaluation
6. Deployment & Cost Analysis
6.1 Common Obstacles
Solution: Use 4-bit quantisation (QLoRA) to reduce memory to ~4-5 GB.
Outcome: Successful; inference possible on T4.
Solution: Implement keep-alive pings or use dedicated endpoints.
Outcome: Acceptable for low-traffic phases.
Solution: Prefer serverless, per-token pricing platforms or self-host with vLLM.
Outcome: No ideal solution found; project discontinued before full deployment.
6.2 Cost-Effective Inference
| Provider | Input Price | Output Price |
|---|---|---|
| Together AI | $0.27 – $0.50 | $0.40 – $3.00 |
| Groq | $0.05 – $0.59 | $0.08 – $0.79 |
| AWS Bedrock | $1.50 – $25.00 | – |
| Self-hosted (T4) | ~$0.50/hour | – |
7. Technical Obstacles
Key technical hurdles included memory-related errors (solved via 4-bit quantisation), dataset loading failures (solved via JSONL conversion), and PEFT version conflicts (solved by pinning versions: peft==0.10.0, transformers==4.40.0).
8. Availability & Licensing
9. Future Work
to at least 10,000 instruction-response pairs, including more dialect examples.
(e.g., MalayMMLU, Malay sentiment analysis) to quantify performance improvements.
(e.g., using content moderation APIs) to reduce harmful or biased outputs.
(4-bit, 8-bit) for local deployment with llama.cpp or Ollama.
to ground answers in a trusted knowledge base, reducing hallucinations.
10. Conclusion
AtlasFlux demonstrates that fine-tuning a medium-sized LLM (7B parameters) with a modest but high-quality dataset can produce a model capable of understanding Malaysian cultural and linguistic nuances at a fraction of the cost of training from scratch.
Final note: For production applications, retrieval-augmented generation (RAG) using a small, cost-effective language model may be more practical than deploying a custom-fine-tuned 7B model. The author has since pivoted to RAG for the ai.atlasflux.my website.
References
- 1. Qwen Team. (2024). Qwen2.5-7B-Instruct Model Card.Hugging Face • huggingface.co/Qwen/Qwen2.5-7B-Instruct
- 2. Wang, C., et al. (2024). Qwen2.5 Technical Report.arXiv:2407.10671
- 3. YTL AI Labs. (2025). Malaysia launches first homegrown LLM, Ilmu.The Edge Malaysia
- 4. Unsloth AI. (2025). Unsloth Documentation.docs.unsloth.ai • docs.unsloth.ai
- 5. Hugging Face. (2026). Inference Endpoints Documentation.huggingface.co • huggingface.co/docs/inference-endpoints