BackResearch Paper

§RESEARCH — Technical Report

AtlasFlux: Fine-Tuning Qwen2.5-7B for Malaysian Cultural and Linguistic Contexts

A Comprehensive Technical Report

Muhammad Nabil
Rainspeed Labs / AtlasFlux AI
June 2026
Apache 2.0
View on Hugging Face
ABSTRACT

Large language models (LLMs) have revolutionised natural language processing, yet their performance in underrepresented languages and cultural contexts remains severely limited. This paper presents atlasflux-qwen-7b-1.0, a fine-tuned variant of the Qwen2.5-7B model adapted specifically for Malaysian Bahasa Melayu, colloquial slang (Manglish), and local cultural knowledge.

The model was fine-tuned using LoRA (Low-Rank Adaptation) on a custom-built dataset of 2,968 instruction-response pairs. We discuss the training methodology, dataset construction, evaluation strategy, deployment challenges, and cost-effective inference solutions. The model is available under the Apache 2.0 license.

1. Introduction

1.1 The Problem

The rapid advancement of LLMs has largely favoured high-resource languages, leaving linguistically diverse regions underserved. Malaysia, with its rich tapestry of Bahasa Melayu, Manglish, and numerous regional dialects, faces a significant gap in AI models that truly understand local communication nuances.

1.3 Project Goals

  • To fine-tune an open-source 7B-parameter LLM (Qwen2.5-7B) on a curated, high-quality dataset of Malaysian-centric examples.
  • To minimise computational cost by using LoRA and 4-bit quantisation, enabling training on a single Google Colab T4 GPU.
  • To release the fine-tuned model under a permissive license (Apache 2.0) to encourage commercial adoption.

2. Model Architecture

2.1 Base Model Specifications

Total Parameters
7.61 billion
Non-embedding Parameters
6.53 billion
Number of Layers
28
Attention Heads (Query)
28
Key-Value Heads (GQA)
4
Context Length
131,072 tokens
Generation Limit
8,192 tokens
Vocabulary Size
152,064
Architecture
RoPE, SwiGLU, RMSNorm
Multilingual Support
29+ languages

3. Dataset Construction

3.1 Design Philosophy

We constructed a custom dataset comprising 2,968 instruction-response pairs in JSONL format, carefully curated to reflect three categories:

~70%
Bahasa Melayu Standard
Educational content, general knowledge, technical explanations
~20%
Colloquial Slang / Manglish
Everyday conversational phrases, informal expressions, 'mamak talk'
~10%
Regional Dialects
Kelantanese, Kedahan, Terengganuan, Johorean, Sabahan, Sarawakian

4. Fine-Tuning Methodology

4.1 Parameter-Efficient Fine-Tuning (PEFT) with LoRA

To minimise computational cost, we employed LoRA. Only 0.26% of the total parameters (approx. 20 million out of 7.6 billion) were trained.

Rank (r)16
LoRA Alpha16
Target Modulesq_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Dropout0.0
Biasnone
Gradient CheckpointingEnabled (Unsloth)

4.2 Training Setup

Batch Size per Device1
Gradient Accumulation8
Effective Batch Size8
Learning Rate2e-4
OptimizerAdamW 8-bit
Warmup Steps5
Max Steps500
Max Sequence Length2048 tokens

5. Evaluation

5.1 Qualitative Evaluation

Prompt
Siapa yang membina AtlasFlux AI?
Model Output
AtlasFlux AI dibina oleh Muhammad Nabil, seorang pelajar UiTM Cawangan Melaka Kampus Alor Gajah dalam Diploma Pengurusan Teknologi Pejabat.
Prompt
Apa itu Rainspeed Labs?
Model Output
Rainspeed Labs Solutions adalah syarikat di sebalik pembangunan AtlasFlux AI, beroperasi di Bukit Gambir, Tangkak, Johor.
Prompt
Aku nak gi mana? (Kelantan)
Model Output
(Generated appropriate dialect response with correct grammar)

6. Deployment & Cost Analysis

6.1 Common Obstacles

GPU memory constraints

Solution: Use 4-bit quantisation (QLoRA) to reduce memory to ~4-5 GB.

Outcome: Successful; inference possible on T4.

Cold start latency

Solution: Implement keep-alive pings or use dedicated endpoints.

Outcome: Acceptable for low-traffic phases.

Platform inconsistency

Solution: Prefer serverless, per-token pricing platforms or self-host with vLLM.

Outcome: No ideal solution found; project discontinued before full deployment.

6.2 Cost-Effective Inference

ProviderInput PriceOutput Price
Together AI$0.27 – $0.50$0.40 – $3.00
Groq$0.05 – $0.59$0.08 – $0.79
AWS Bedrock$1.50 – $25.00
Self-hosted (T4)~$0.50/hour

7. Technical Obstacles

Key technical hurdles included memory-related errors (solved via 4-bit quantisation), dataset loading failures (solved via JSONL conversion), and PEFT version conflicts (solved by pinning versions: peft==0.10.0, transformers==4.40.0).

8. Availability & Licensing

Repositoryrainspeed/atlasflux-qwen-7b-1.0
LicenseApache 2.0
Base ModelQwen/Qwen2.5-7B-Instruct

9. Future Work

Expand the dataset

to at least 10,000 instruction-response pairs, including more dialect examples.

Benchmark against Malay-specific tasks

(e.g., MalayMMLU, Malay sentiment analysis) to quantify performance improvements.

Implement safety filtering

(e.g., using content moderation APIs) to reduce harmful or biased outputs.

Provide quantised GGUF versions

(4-bit, 8-bit) for local deployment with llama.cpp or Ollama.

Integrate with RAG

to ground answers in a trusted knowledge base, reducing hallucinations.

10. Conclusion

AtlasFlux demonstrates that fine-tuning a medium-sized LLM (7B parameters) with a modest but high-quality dataset can produce a model capable of understanding Malaysian cultural and linguistic nuances at a fraction of the cost of training from scratch.

Final note: For production applications, retrieval-augmented generation (RAG) using a small, cost-effective language model may be more practical than deploying a custom-fine-tuned 7B model. The author has since pivoted to RAG for the ai.atlasflux.my website.

References

  1. 1. Qwen Team. (2024). Qwen2.5-7B-Instruct Model Card.
  2. 2. Wang, C., et al. (2024). Qwen2.5 Technical Report.
    arXiv:2407.10671
  3. 3. YTL AI Labs. (2025). Malaysia launches first homegrown LLM, Ilmu.
    The Edge Malaysia
  4. 4. Unsloth AI. (2025). Unsloth Documentation.
    docs.unsloth.ai docs.unsloth.ai
  5. 5. Hugging Face. (2026). Inference Endpoints Documentation.