The technology used in smart homes have improved to learn the userpreferences from feedbacks in order to provide convenience to the user in thehome environment. Most smart homes learn a uniform model to represent thethermal preference of user which generally fails when the pool of occupantsincludes people having different age, gender, and location. Having differentthermal sensation for each user poses a challenge for the smart homes to learna personalized preference for each occupant without forgetting the policy ofothers. A smart home with single optimal policy may fail to provide comfortwhen a new user with different preference is integrated in the home. In thispaper, we propose POSHS, a Bayesian Reinforcement learning algorithm that canapproximate the current occupant state in a partial observable environmentusing its thermal preference and then decide if its a new occupant or belongsto the pool of previously observed users. We then compare POSHS algorithm withan LSTM based algorithm to learn and estimate the current state of the occupantwhile also taking optimal actions to reduce the timesteps required to set thepreferences. We perform these experiments with upto 5 simulated human modelseach based on hierarchical reinforcement learning. The results show that POSHScan approximate the current user state just from its temperature and humiditypreference and also reduce the number of time-steps required to set optimaltemperature and humidity by the human model in the presence of the smart home.