Physiological Genomics

Novel method for high-throughput phenotyping of sleep in mice

Allan I. Pack, Raymond J. Galante, Greg Maislin, Jacqueline Cater, Dimitris Metaxas, Shan Lu, Lin Zhang, Randy Von Smith, Timothy Kay, Jie Lian, Karen Svenson, Luanne L. Peters


Assessment of sleep in mice currently requires initial implantation of chronic electrodes for assessment of electroencephalogram (EEG) and electromyogram (EMG) followed by time to recover from surgery. Hence, it is not ideal for high-throughput screening. To address this deficiency, a method of assessment of sleep and wakefulness in mice has been developed based on assessment of activity/inactivity either by digital video analysis or by breaking infrared beams in the mouse cage. It is based on the algorithm that any episode of continuous inactivity of ≥40 s is predicted to be sleep. The method gives excellent agreement in C57BL/6J male mice with simultaneous assessment of sleep by EEG/EMG recording. The average agreement over 8,640 10-s epochs in 24 h is 92% (n = 7 mice) with agreement in individual mice being 88–94%. Average EEG/EMG determined sleep per 2-h interval across the day was 59.4 min. The estimated mean difference (bias) per 2-h interval between inactivity-defined sleep and EEG/EMG-defined sleep was only 1.0 min (95% confidence interval for mean bias −0.06 to +2.6 min). The standard deviation of differences (precision) was 7.5 min per 2-h interval with 95% limits of agreement ranging from −13.7 to +15.7 min. Although bias significantly varied by time of day (P = 0.0007), the magnitude of time-of-day differences was not large (average bias during lights on and lights off was +5.0 and −3.0 min per 2-h interval, respectively). This method has applications in chemical mutagenesis and for studies of molecular changes in brain with sleep/wakefulness.

  • sleep disorders
  • mutagenesis
  • mouse
  • phenotyping

sleep and wakefulness are stereotypical behaviors that occur in cyclical bouts across the day. Current methods to assess sleep and wakefulness in mammalian species, including mice, are by continuous recording of electroencephalographic (EEG) and electromyographic (EMG) signals. This method involves surgical implantation of electrodes into the skull and muscle, time to recover from surgery before recording, as well as an intensive effort to score the continuous recordings, typically in 10-s epochs (i.e., 8,640 epochs across 24 h). This methodology is not suited to high-throughput assessment of the large numbers of mice required to detect altered gene function as a result of chemical mutagenesis (for reviews of this approach, see Refs. 3, 9, 1117) or to assess differences in multiple inbred strains (57, 19) that permit identification of quantitative trait loci. Chemical mutagenesis has previously proved productive in this area and was the methodology used to identify the circadian clock gene clock (1, 18). We therefore sought to develop a new high-throughput technique that can estimate amounts of sleep and wakefulness in mice without the need for surgery and chronic implantation of electrodes. This new technique is based on the concept that the longer a bout in which the mouse is inactive, the more likely the bout is to be sleep. During short bouts of inactivity, the mouse may be immobile but awake. Inactivity was assessed in two ways: 1) by evaluating movement that breaks infrared beams in the mouse cage and 2) by digital analysis of videos. We did these studies in C57BL/6J male mice, which is a commonly used strain for studies of sleep in mice. A Medline search looking for articles in 2004–2006 using keywords “sleep/mouse” and then determining when sleep was actually recorded revealed that of the 69 articles, 32 used this mouse strain and sex. Moreover, of the eight world-wide chemical ENU mutagenesis projects assessing behavioral phenotypes summarized by Keays and Nolan (9), five of these use a pure C57BL/6 strain, including the Jackson Laboratory (16); Riken, Genomic Sciences Center, Japan; Northwestern University (17); Novartis; and UCLA School of Medicine (14).

We present here the basis of the approach and its validation by direct comparison with EEG/EMG assessments of sleep and wakefulness in the same mice. In preliminary studies, we compared estimates of sleep and wake based on inactivity/activity and compared our estimates averaged across 8 C57BL/6J mice with published pooled data on sleep and wake (7). We changed the definition used to predict that a discrete time interval was sleep by sequentially increasing the threshold duration of inactivity from 10 s or more, 20 s or more, up to 120 s in 10-s increments. We found that the minimum total squared error between estimates, based on inactivity and that from published data, was obtained when the minimum duration of inactivity used to predict sleep was set to 40 s. Thus, in this study we sought to validate this 40-s criterion based on simultaneously assessed activity/inactivity and sleep/wake from EEG/EMG recording in individual mice. Activity/inactivity was assessed in two ways: by digital video analysis of movement and by determining movement by the mouse breaking infrared beams. We show that both methods produce equivalent estimates of sleep.



The primary studies were done in C57BL/6J male mice 8–10 wk of age obtained from the Jackson Laboratory. All animal experiments were performed in accordance with the guidelines published in the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the University of Pennsylvania Animal Care and Use Committee.

EEG/EMG assessment of sleep and wakefulness.

To assess sleep and wakefulness mice were surgically implanted with EEG/EMG electrodes. Briefly, animals were anesthetized by injection of ketamine (100 mg/kg ip) and xylazine (10 mg/kg ip). The skull was exposed and prepared for placement of four silver ball EEG electrodes. Two Teflon-coated EMG electrodes bared at the tips were sutured to the dorsal neck muscles. All leads from the electrodes were connected to a plastic socket connector (Plastics One), which was fixed to the skull with dental cement. Following surgery, animals were allowed to recover for 10 days before any studies were performed.

EEG and EMG signals were amplified using the Neurodata amplifier system (model M15; Astro-Med, West Warwick, RI). Signals were amplified (20,000×) and conditioned with neuroamplifiers/filters (model 15A94, Astro-Med). Settings for EEG signals were: low cut-off frequency (−6 dB), 0.1 Hz; and high cut-off frequency (−6 dB), 100 Hz. The filter settings for EMG signals were: low cut-off frequency (−6 dB), 10 Hz; and the high cut-off frequency (−6 dB), 100 Hz. Samples were digitized at 256 Hz samples/second/channel. All data were acquired using Grass Gamma software (Astro-Med).

In the eight mice studied (see below) we scored non-rapid-eye-movement (NREM) and rapid-eye-movement (REM) sleep and wakefulness in 10-s epochs across a 24-h period (12-h light-dark cycle; 7 AM–7 PM). Data collected using Astro-Med's Gamma software were converted to European Data Format and manually scored using the Somnologica Science analysis package (Medcare). Scoring included identification of arousals within sleep periods and removal of artifact. This allowed us to tabulate sleep stage changes and produce totals of wakefulness, NREM, and REM over desired intervals. Records were scored twice by an experienced scorer. For those epochs where there was disagreement, the epoch was rescored and the disagreement was resolved.

Assessment of activity and inactivity: breaking infrared beams.

Activity/inactivity was determined using the Opto M3 monitoring system in a cage identical to that used in the CLAMS system (Comprehensive Laboratory Animal Monitoring System; Columbus Instruments, Columbus, OH). This system has beams that are 0.5 inches apart on the horizontal plane providing a high-resolution grid covering the XY-planes. Software provides counts of beam breaks by the mouse, in 10-s epochs. For estimating sleep and wakefulness, we examined the total ambulatory counts in the XY-plane. The mouse was considered inactive if there were no such counts in a given 10-s epoch.

Assessment of activity and inactivity: digital video analysis.

Mouse movements are estimated from digital video analysis based on the “blob” analysis technique. First, we build an averaged background from a sequence of video frames where there is no mouse present. Then, the mouse region in the video frame is extracted by subtracting the mouse frames from the background frame. On the basis of a follow-up process that uses the extracted mouse region as a blob (a rectangle enclosing the mouse whose shape changes based on the mouse change of shape), the XY-position of the center of mass of the mouse is approximated by estimating the coordinates of the center of gravity of the blob. From this analysis, we obtain the XY-position of the center of gravity of the mouse at 10 frames/second and then calculated velocity by estimating the distance moved of the central XY-position of mouse frame-by-frame. We averaged these estimates of velocity over 10-s intervals. When the average velocity was <3 pixel/s, the mouse was considered inactive.

Validation study: protocol.

To assess the reliability of the new method we carried out a study in eight mice. (Video data were only obtained in 7 mice.) Mice were acclimated for 5 days in the CLAMS cage, fed food and water ad libitum, and kept on a 12-h light-dark cycle with lights on at 7 AM. Following acclimation, mouse behavior was analyzed by video and by infrared beam breaking for 24 h. Thereafter, mice were anesthetized, and electrodes were implanted for recording of EEG/EMG. Following 10 days of recovery from surgery, mouse behavior was again analyzed for 24 h by infrared beam breaking and video but without the electrode cable being connected. Thereafter, the electrode cable was connected for recording of EEG/EMG, with recording of infrared beam breaking and video for another 24 h.

Statistical analysis.

Initial descriptive analyses involved assessment of epoch to epoch agreement between methods across 8,640 10-s epochs for each mouse. We first calculated percentage agreement for each mouse, i.e., percentage of 8,640 epochs where different methods gave the same prediction of sleep or wakefulness for each epoch and then summarized mouse-specific agreement across mice.

Formal analyses of agreement were performed using the approach described by Bland and Altman (2). Further analyses were performed by mixed-model analysis of variance (ANOVA). To do so, we divided the 24-h period into 12 2-h intervals. This gives 12 estimates of sleep based on movement that were compared with that from “gold standard” EEG/EMG recording for each mouse. We then calculated the mean bias, i.e., difference between estimates of sleep based on movement and that from gold standard EEG/EMG determination, as well as the Bland-Altman limits of agreement. Bias in this study is the expected difference between an estimate of total sleep as determined through assessment of activity and inactivity and the amount of total sleep as determined from gold standard EEG/EMG. In general, “if there is a consistent bias we can adjust for it by subtracting the mean difference from the new method” (2). Furthermore, the presence of a fixed relative bias does not alter conclusions related to statistical precision of the new method (4). This is because subtracting a fixed quantity does not affect variance. Nonconstant bias can similarly be dealt with through regression adjustment, so in general, while it is critical to characterize bias, it is the population variance around expected bias that truly reflects the utility (i.e., reliability) of the proposed methods. The limits of agreement define an interval expected to cover 95% of the differences between the two methods of measurement. The mixed-model ANOVA was used to assess the impact of having multiple 2-h intervals per mouse as well as to assess whether agreement varied by time of day.

We determined in our preliminary studies that predicting sleep when there was ≥40 s of continuous inactivity resulted in a minimum prediction error relative to published group average data. To validate this criterion, we performed sensitivity analyses in which we repeated our within-animal agreement analysis varying the duration of inactivity criterion from 10 to 120 s in increments of 10 s. In this way we planned to confirm that the 40-s threshold has optimum bias and precision characteristics.


Overall validation of estimates of sleep (wakefulness) compared with EEG/EMG.

When we used a duration of inactivity ≥40 s to estimate sleep (or the converse, wakefulness) and compared video analysis of inactivity/activity to simultaneous assessment by EEG/EMG recording, we found epoch-by-epoch agreement (8,640 epochs in each mouse) ranged from 88 to 94% across seven mice (average agreement = 92%). This excellent agreement is shown averaged across all seven mice in Fig. 1 in 2-h intervals across the day. There is the same level of agreement in individual mice in 2-h intervals (Fig. 2A, randomly selected mouse) or in a different mouse in 1-h intervals (Fig. 2B, different individual mouse). The latter is, as expected, more variable across the day, but this algorithm efficiently tracks changes in sleep as is shown by the agreement with the EEG/EMG method.

Fig. 1.

Comparison of amounts of sleep in 2-h intervals across the day averaged across 7 mice as obtained from algorithm that defines sleep as 40 s or more of inactivity by video analysis and from simultaneous assessment from electroencephalographic-electromyographic (EEG/EMG) recording.

Fig. 2.

A: comparison of sleep amounts in 2-h intervals between that estimated from inactivity from video analysis and that from simultaneous EEG/EMG recording in an individual mouse. B: same data but for a different mouse in 1-h intervals across the day.

We also examined the distribution of different bout lengths of sleep in both the lights on (Fig. 3A) and lights off (Fig. 3B) periods separately, comparing the results from the analysis based on inactivity as estimated from video analysis to that from EEG/EMG assessment of sleep. The distributions were similar (see Fig. 3), although the analysis based on inactivity underestimated the number of bouts of short duration found by EEG/EMG recording. This is to be expected since the inactivity analysis defines all bouts of sleep as ≥40 s. Changing this definition to include all bouts of inactivity shorter than 40 s leads to an overestimation of sleep since the majority of short bouts of inactivity occur during wakefulness. The strategy defining sleep based on inactivity also underestimated the longer bouts of sleep. This results from small arousal movements during sleep crossing the movement threshold defining inactivity, thereby terminating the predicted sleep bout.

Fig. 3.

Comparison of distribution of % total sleep that is made up by bouts of different durations as determined by our algorithm from video analysis (denoted activity/speed) and from EEG. The distributions are similar but our algorithm, expectedly, underestimates the number of short bouts, i.e., <40 s. It also underestimates the number of larger bouts of sleep. Data are shown for lights on (A) and lights off (B).

When we did a similar analysis comparing estimates of sleep from breaking infrared beams to EEG/EMG, we found the agreement was surprisingly poor in several mice (see Table 1). Further analysis indicated that there were epochs where the EEG/EMG analysis indicated sleep but there were beam breaks indicating ambulatory movement. The mice with the poorest agreement had the highest number of such epochs (see Table 1). When we directly examined the simultaneous video in these epochs, we could see that the mouse was asleep and was not moving, but the cable connected to the mouse's head was oscillating, as a result of minor head twitches by the mouse. We assume that for some mice the cable must have been positioned across an infrared beam resulting in artifactual counts of beam breaks. This artifact prevents us from directly validating the method based on infrared beam breaks with simultaneous EEG/EMG recording.

View this table:
Table 1.

Disagreement between algorithm-defined and EEG/EMG-recorded wake and sleep

To indirectly validate this aspect of our methodology, we compared estimates of inactivity/activity and of sleep/wakefulness from simultaneous assessment of activity by infrared beam breaking and by video analysis. This comparison was done over 24 h before the mouse had surgery for implantation of EEG/EMG electrodes and after surgery, but before the cable was connected to the mouse skull. In neither condition was the cable present so that the artifact described above could not occur. When we looked at agreements for 10-s epochs of inactivity the average agreement across seven mice between the two methods was excellent. The average agreement before any surgery was 92% (range in individual mice: 89–93%) and 93% (range 92–94%) following surgery but before the cable was attached to the head of the mouse. We also found excellent agreement averaged across all mice (and in individual mice, data not shown) between estimates of sleep (wakefulness) across the day from analysis of video and from infrared beam breaking. We show in Fig. 4 the agreement averaged across all mice in 2-h intervals both before and after surgery when there was no cable connection to the mouse skull. Hence, both methods of estimating inactivity provide excellent estimates of sleep (or wakefulness).

Fig. 4.

Comparison of average estimates of percentage of sleep in 2-h intervals across the day from measurements of inactivity by digital video analysis (□) and from breaking infrared beams (•). Data are shown averaged across 7 mice before any surgery (A) and after surgery (B) but before mice were connected to a cable for assessment of EEG/EMG. In both cases there is excellent agreement between the 2 methods.

Analysis of bias and precision.

We performed analyses of bias and precision on the basis of simultaneous video analysis and EEG/EMG in seven mice and using data from 12 2-h intervals across the day in each mouse, i.e., N = 12 × 7 = 84 observations. The mean (SD) minutes of sleep estimated from video was 60.4 (29.8), while for EEG/EMG it was 59.4 (27.5) min. Overall, estimated bias was very small [1.0 min, 95% confidence interval (CI) −0.6 to +2.6] and not statistically significantly different from zero (P = 0.22). The Bland-Altman 95% limits of agreement were −13.7 to +15.7 min per 2-h interval. The SD of differences between video and EEG/EMG sleep per 2-h interval was 7.5 min, implying that <5% of cases have differences >15 min per 2-h interval. Mixed-model ANOVA demonstrated that expected bias did not vary among individual mice, validating analyses based on 2-h intervals as the units of analysis. We did observe small but statistically significant differences in expected bias across the time of day [F(11,66) = 3.5, P = 0.0007]. Predicted bias during lights on (7 AM–7 PM) and lights off (7 PM–7 AM) were +5.0 and −3.0 min per 2-h interval, respectively. Thus, while sleep predicted by inactivity had small positive bias during lights on and a small negative bias during lights off, in both cases expected bias was very small in magnitude.

Effect of varying duration of continuous inactivity that is used to define sleep on estimates of sleep amounts.

We performed a sensitivity analysis to assess the effects of varying the duration of continuous inactivity used to define sleep on quality of agreement between predictions of sleep based on inactivity by video and actual sleep determined from simultaneous EEG/EMG recording. We varied the inactivity threshold from 10 to 120 s in increments of 10 s. The complete data are shown in Table 2, while we show in Fig. 5 the average difference in the estimates as a function of the duration of inactivity defining sleep. These data show that the ≥40-s criterion for inactivity achieves near minimum bias (1.0 min). Although a ≥50-s criterion achieves a slightly smaller bias in absolute magnitude (−0.2 min per 2-h interval), it does so at the cost of slightly less precision (SD = 7.7 min compared with 7.5 min). The ≥40-s criterion achieves maximum precision (i.e., achieves the smallest SD of differences between predicted sleep based on inactivity relative to simultaneous EEG/EMG recording). Although an SD of 7.5 min is also achieved using a ≥10-s criterion for inactivity, the bias for this criterion is larger than for any other choice of criterion for inactivity. Thus, the sensitivity analyses confirmed that the ≥40-s criterion for inactivity is near optimal. For all criteria, expected bias has a statistically significant linear association with the amount of sleep per 2-h interval (r = 0.30, P = 0.004 for the ≥40-s criterion). Thus, bias becomes more positive as true sleep duration per 2-h interval increases. This finding is consistent with the observation that expected bias is positive during lights on when mice are sleeping more (+5.0 min per 2-h interval), and negative during lights off when mice sleep less (−3.0 min per 2-h interval).

Fig. 5.

Relationship between mean difference between estimated sleep from inactivity bouts and that from EEG/EMG defined sleep and the duration of continuous inactivity that was set as the minimum to define sleep. The difference is for the amount of sleep in 2-h intervals. For the gold standard method, i.e., EEG/EMG recording, the average amount of sleep in 2-h intervals was 59.4 min. For each duration shown, sleep was defined as an episode of inactivity from video analysis that was greater than or equal to that time in seconds. Very short durations of inactivity leads to overestimation of sleep amounts, while longer durations lead to an underestimation. There is a duration of inactivity between 30 and 80 s that provides close agreement with the optimum being 40–50 s. The difference from reference being +1.0 and −0.2 min, respectively.

View this table:
Table 2.

Bland-Altman analyses of agreement between video-based (duration of inactivity) estimates and EEG/EMG-determined measurements of sleep per 2-h interval

The mean difference between estimated sleep and EEG/EMG-defined sleep was not particularly sensitive to the definition of the duration of continuous inactivity used to define sleep over a reasonable interval (see Fig. 5). As expected, very short durations of inactivity to define sleep resulted in an overestimation of sleep amounts, since for short intervals of inactivity the mouse may be immobile but awake (see Fig. 5). In contrast, using longer durations of continuous inactivity to define sleep led to an underestimation of sleep, i.e., negative difference between estimated sleep and that from EEG/EMG.


In this report we describe a high-throughput method designed to estimate amounts of sleep and wakefulness, number of bouts, and duration of bouts of sleep and wakefulness in mice. The method is based on predicting sleep when the duration of inactivity is ≥40 s. Inactivity can be assessed with equal effectiveness by digital video analysis or by subjects breaking infrared beams. We show excellent agreement between this simple high-throughput methodology and sleep and wakefulness determined by the standard methodology of EEG/EMG recording in C57BL/6J male mice at 8–10 wk of age. Overall, there was negligible bias, which was not statistically significantly different from zero (P = 0.22). From the 95% CI, we can conclude that over all intervals across the day, systematic bias is not less than −0.6 min and not larger than 2.6 min. Analyses did reveal that bias varied with duration of sleep per interval. We found that expected bias was +5.0 min during lights-on intervals when the mouse is more often sleeping and −3.0 min during lights-off intervals when the mouse is less often sleeping. Overall, mean sleep per 2-h interval by EEG/EMG was 59.9 min, while it was 74.8 min during lights on and 44.0 min during lights off. Thus, in percentage terms, expected bias is +6.7% during light on and −6.8% during lights off. These values are relatively small considering applications in areas such as mutagenesis where important changes in sleep per interval are not expected to be so subtle. A more important characteristic is precision of the agreement measured as the SD of differences. We found this to be 7.5 min per 2-h interval. The corresponding Bland-Altman 95% limits of agreement were −13.7 to +15.7 min. Thus, over all 2-h intervals, 95% of predicted sleep durations are expected to be within roughly 15 min of values that would be determined by EEG/EMG. Thus, it is highly unlikely that predicted sleep per 2-h interval would differ from actual sleep by more than 15 min.

This method will have two immediate applications. First, it can be used to screen mice that are being mutagenized as part of a forward genetics approach to identify novel genes. Several such mutagenesis programs are going on around the world that are based on C57BL/6 mice (9, 14, 16, 17). In all cases, if deviation from normal is found, and it is found to be heritable, subsequent confirmation by EEG/EMG recording of the specific abnormality is required.

A second important application is in studies of molecular changes in tissues such as specific brain regions with sleep/wake and extended wakefulness (sleep deprivation). The strategy shown here has several advantages. First, it allows more rapid quantification of behavior and reduces expense and time associated with the need for surgical implantation, recovery from surgery, and scoring of EEG/EMG records. Given the high-throughput nature of the strategy, one can screen larger numbers of mice quickly. This allows normative data to be established and hence determine whether a specific mouse being studied is a behavioral outlier. Data from this mouse can be excluded, thereby reducing nuisance variance that is not associated with the experimental conditions under study. In our own laboratory we have developed a normative data set of sleep and wake amounts in 115 C57BL/6J mice that we use for this purpose. Avoiding surgery has another advantage. It removes any potential confounding effects of the surgery itself. Maloney et al. (10) have shown that following surgery and implantation of electrodes, there is increased c-fos expression in brain in rats. The noninvasive nature of the behavioral assessment strategy described here avoids this potential problem.

Our studies were performed specifically in male C57BL/6J mice 8–10 wk of age. It is conceivable that in other strains the optimal duration of inactivity that provides the best estimate of sleep may be different. However, we have found that in C57BL/6J mice the estimates of sleep and wakefulness are relatively insensitive to the specific duration of inactivity used to define sleep (see Table 2 and Fig. 5). Very reasonable estimates of sleep are obtained in C57BL/6J mice over a range of durations of inactivity used for the definition of sleep, i.e., from 30 to 80 s. Over this range the average differences in estimated sleep is only +2.3 to −2.9 min in 2 h. Moreover, the goal of the strategy outlined here is to screen for differences in sleep and wake amounts; if applied to different inbred strains, differences so identified need to be confirmed by EEG/EMG recording.

The method cannot, as it stands, be applied to other rodents such as rats without first determining the duration of inactivity that defines sleep. Also, the method cannot be used to define the substages of sleep, i.e., NREM and REM. It is conceivable, however, that other aspects of the behavior of the mouse that differ in these states could be identified by video analysis and we are currently investigating this.

In conclusion, we have developed and validated a simple method to estimate sleep and wakefulness in C57BL/6J mice that avoids the necessity for surgery and implantation of chronic electrodes. This method will allow sleep and wakefulness to be assessed as part of behavioral screening and in studies involving assessment of molecular changes, for example in brain, with sleep and wakefulness and sleep deprivation. Previous descriptions of techniques for behavioral screening of, for example, knockout mice, have not included sleep and wakefulness (see, for example Ref. 8), presumably because of the necessity of surgery and time to recover. The method we have described has a number of immediate applications and likely can be refined to extract other features from video analysis.


The research was supported by National Institutes of Health Program Project Grant AG-17628 and Programs for Genomic Applications HL-66611.


We are grateful to Drs. Bev Paigen, Sigrid Veasey, and Ted Abel for helpful discussions and suggestions and to Daniel Barrett for help in preparation of the manuscript.


  • Address for reprint requests and other correspondence: A. I. Pack, Translational Research Laboratories, Rm. 2120, 125 South 31st St., Philadelphia, PA 19104-3403 (e-mail: pack{at}

    Article published online before print. See web site for date of publication (


View Abstract