Data collection and management

Clear objectives are required before a study is carried out. Based on these, the design, execution and analyses are planned. A key part of the research is to collect data from a number of animals. The data must be amenable to statistical analysis from which one can draw inferences or can predict future observations. There are various sources of data for use in animal breeding studies:

  1. Research scientists set up experiments and collect data from experimental animals.
  2. Data can be obtained from farms (field records) through livestock recording schemes (Module 3, Section 4.4). 
  3. Breed information data can be collected using questionnaire forms as outlined in the following section.

Scientists should have a clear understanding of the principles of statistics governing the planning of experiments and the analysis and interpretation of experimental data. It is important to design experiments properly so as to collect useful data. Costs usually prohibit the setting up and running of large experiments to collect the required data. However, often, large data sets are required to get reliable estimates of phenotypic, genetic and environmental variation.

Data sourcing through on-farm surveys of livestock breeds

In many developing countries, information on existing livestock populations in different areas is not available and livestock recording is not regularly practised. In order to understand the production systems and develop improvement programmes, it is necessary to capture existing knowledge on the livestock breeds or populations that are considered to be of most interest. Such information is generally captured through surveys. Surveys are also used to determine the status of different breeds in a country, providing key information for developing breed improvement and conservation strategies.

Designing on-farm surveys of livestock breeds

The first step is to decide what type of survey (random, purposive, convenience or representative) is to be undertaken, and the size of the population to be surveyed. Either the whole population or samples of the population can be surveyed. For a sample, the proportion of the farming community or households to be surveyed needs to be determined. This needs to be large enough to allow population values to be estimated with adequate precision; it should also cover all strata of the population related to the topic of interest. At the same time, costs of collecting data need to be realistically considered. Different sampling designs are available from simple random sampling to those using stratified and clustering techniques [Oromiya document-ILRI]. Data in surveys are usually collected using questionnaires designed to allow accurate and unambiguous answers. Key activities when carrying out a survey are illustrated in Figure 2.


Figure 2. Key activities when planning and implementing a breed survey

Pre-testing of a questionnaire on a small number of farms or households is an essential and useful way of evaluating the suitability and level of detail that it is possible to obtain from the interviewees. For example, if the purpose of the survey is to estimate the population of livestock in a given area and the basic unit is a village, then one must ensure that:

  • the total number of households in a village is known 
  • the number of such households that keep livestock is known
  • the average number of livestock per livestock-keeping household is known. 

These can be obtained during pre-survey visits; they give an indication of how best to achieve high accuracy and precision during the survey (see Module 2, Section 2).

Implementing on-farm surveys

In implementing on-farm surveys, the following should be considered:

  • Adequate prior and mid-stream consultations with all stakeholders (farmers, local administrative officials, politicians, donors etc.)
  • Timing of the survey (season and even month within seasons) 
  • Time for visits to farms, and where to interview respondents (in the homestead or on grazing fields) 
  • Who the respondents should be (household heads, children or employees). 

A combination of all the above may actually be used. For example, in a society where milking is exclusively done by children and women, the best answers to the question related to how much milk an animal produces daily would be given by the family members who actually do the milking, although the household head to may respond to the entire questionnaire.

Breed descriptor charts and guidelines on animal phenotypic characteristics, such as those developed by ILRI and used for the Oromiya-ILRI Livestock Breed Survey (2001), may be available to assist enumerators and questionnaire administrators to make on-farm survey decisions. However, the occasional use of photographs to capture whole herds, while in pens, kraals or grazing, greatly helps to countercheck the accuracy and consistency of such scoring. Likewise, asking the same question to different members of the household may also help verify some discrepancies, especially where respondents seem to be giving pre-planned answers or non-plausible ones.

Data management and exploration

Raw data are entered into the computer in such a manner that the information can be found and understood long after the time of data entry, and checked for any possible errors. The data are then organized into an appropriate form for analyses. All data should be archived so that they remain available for later reference. A good data management strategy should be adapted using data management software such as Access or Oracle that has facilities for some data checking at the time of entry. Spreadsheet packages (e.g. Excel, Lotus-123) though simple and apparently flexible, should be used with caution for data management.

Data exploration

Once the necessary edits on the data have been done, it is important that one understands the data structure and the patterns displayed in the data in order to decide how best to conduct the statistical analysis [Biometrics example 1] [ICAR technical series on animal recording]. The distribution of animals by different classification (e.g. age and sex) can be determined and the mean, median and range for each factor or classification variable summarized. These statistics can then be used to group the animals into suitable subclasses to reflect the variation in the data expressed by a particular factor. Furthermore, such statistics can ensure that sufficient numbers of animals are contained within each subclass to allow reasonable inferences to be made about the influence of different levels of the factor on the trait being studied.

The number of observations per subclass usually varies for field data and some experimental data. In some cases, data that initially had an equal number of observations per subclass can end up having different numbers of observations after data editing. Data with an unequal number of observations per subclass are known as unbalanced data; there are statistical methods that have been developed to handle such data [Biometrics example 2].

In an analysis, the pattern of data is described using a model. The final model that is used to describe the data will serve as the best judge of the quality of statistical analysis. An appropriate model can only be chosen when one understands the data.