With the development of Connected Vehicle (CV) technology, temporal variationof roadway traffic can be captured by sharing Basic Safety Messages (BSMs) fromeach vehicle using the communication between vehicles as well as withtransportation roadside infrastructures (e.g., traffic signal) and trafficmanagement centers. However, the penetration of connected vehicles in the nearfuture will be limited. BSMs from limited CVs could provide an inaccurateestimation of current speed or space headway. This inaccuracy in the estimatedcurrent average speed and average space headway data is termed as noise. Thisnoise in the traffic data significantly reduces the prediction accuracy of amachine learning model, such as the accuracy of long short term memory (LSTM)model in predicting traffic condition. To improve the real time predictionaccuracy with low penetration of CVs, we developed a traffic data predictionmodel that combines the LSTM with a noise reduction model (the standard Kalmanfilter or Kalman filter based Rauch Tung Striebel (RTS)). The average speed andspace headway used in this study were generated from the Enhanced NextGeneration Simulation (NGSIM) dataset, which contains vehicle trajectory datafor every one tenth of a second. Compared to a baseline LSTM model without anynoise reduction, for 5 percent penetration of CVs, the analyses revealed thatcombined LSTM\RTS model reduced the mean absolute percentage error (MAPE) from19 percent to 5 percent for speed prediction and from 27 percent to 9 percentfor space headway prediction. The overall reduction of MAPE value ranged from 1percent to 14 percent for speed and 2 percent to 18 percent for space headwayprediction compared to the baseline model.