Automatic quantification of human interaction behaviors based on languageinformation has been shown to be effective in psychotherapy research domainssuch as marital therapy and cancer care. Existing systems typically use amoving-window approach where the target behavior construct is first quantifiedbased on observations inside a window, such as a fixed number of words orturns, and then integrated over all the windows in that interaction. Given abehavior of interest, it is important to employ the appropriate length ofobservation, since too short a window might not contain sufficient information.Unfortunately, the link between behavior and observation length for lexicalcues has not been well studied and it is not clear how these requirementsrelate to the characteristics of the target behavior construct. Therefore, inthis paper, we investigate how the choice of window length affects the efficacyof language-based behavior quantification, by analyzing (a) the similaritybetween system predictions and human expert assessments for the same behaviorconstruct and (b) the consistency in relations between predictions of relatedbehavior constructs. We apply our analysis to a large and diverse set ofbehavior codes that are used to annotate real-life interactions and find thatbehaviors related to negative affect can be quantified from just a few wordswhereas those related to positive traits and problem solving require muchlonger observation windows. On the other hand, constructs that describedysphoric affect do not appear to be quantifiable from language informationalone, regardless of how long they are observed. We compare our findings withrelated work on behavior quantification based on acoustic vocal cues as well aswith prior work on thin slices and human personality predictions and find that,in general, they are in agreement.