In this thesis, we address the data scarcity and limitations of linguistictheory by proposing language-agnostic multi-task training methods. First, weintroduce a meta-learning-based approach, meta-transfer learning, in whichinformation is judiciously extracted from high-resource monolingual speech datato the code-switching domain. The meta-transfer learning quickly adapts themodel to the code-switching task from a number of monolingual tasks by learningto learn in a multi-task learning fashion. Second, we propose a novelmultilingual meta-embeddings approach to effectively represent code-switchingdata by acquiring useful knowledge learned in other languages, learning thecommonalities of closely related languages and leveraging lexical composition.The method is far more efficient compared to contextualized pre-trainedmultilingual models. Third, we introduce multi-task learning to integratesyntactic information as a transfer learning strategy to a language model andlearn where to code-switch. To further alleviate the aforementioned issues, wepropose a data augmentation method using Pointer-Gen, a neural network using acopy mechanism to teach the model the code-switch points from monolingualparallel sentences. We disentangle the need for linguistic theory, and themodel captures code-switching points by attending to input words and aligningthe parallel words, without requiring any word alignments or constituencyparsers. More importantly, the model can be effectively used for languages thatare syntactically different, and it outperforms the linguistic theory-basedmodels.