Speech evaluation is an essential component in computer-assisted languagelearning (CALL). While speech evaluation on English has been popular, automaticspeech scoring on low resource languages remains challenging. Work in this areahas focused on monolingual specific designs and handcrafted features stemmingfrom resource-rich languages like English. Such approaches are often difficultto generalize to other languages, especially if we also want to considersuprasegmental qualities such as rhythm. In this work, we examine threedifferent languages that possess distinct rhythm patterns: English(stress-timed), Malay (syllable-timed), and Tamil (mora-timed). We exploitrobust feature representations inspired by music processing and vectorrepresentation learning. Empirical validations show consistent gains for allthree languages when predicting pronunciation, rhythm and intonationperformance.