Incorporating Human Reliability Analysis to enhance Maintenance Audits : The Case of Rail Bogie Maintenance

Human errors occurring during railway maintenance activities can significantly reduce the availability of equipment. Identification of potential human errors, their causes and prediction of the associated probabilities are important stages in order to manage such errors. This paper investigates the probability of human error during the maintenance of railway bogies. A case study examines technicians performing maintenance on the disc brake assembly unit, wheel set, and bogie frame under various error producing conditions in a railway maintenance workshop in Luleå, Sweden. The Human Error Assessment and Reduction Technique (HEART) is employed to determine the probability of human error occurring during each of the maintenance tasks, while fault tree analysis is used to define the potential errors throughout the maintenance process. The probability of a technician committing an error during the maintenance of the disc brake assembly, wheel set, and bogie frame is found to be 0.20, 0.039 and 0.021 respectively, with the human error probability (HEP) for the entire bogie 0.24. Time pressure, ability to detect and perceive problems, over-riding information, the need to make decisions and mismatches between the operator and designer’s model turn out to be major contributors to human error. These findings can help maintenance management personnel to better understand the error producing conditions that may lead to errors and in turn serve as an input to modify policies and guidelines for railway maintenance tasks.

1. INTRODUCTION   Dhillon (2007) highlights the key role of the railway system in a nation's economy.It is therefore an imperative for all stakeholders with a railway network worldwide to aim for a safe, highly reliable, and excellent quality railway system (Wilson et al., 2007).While the safety of railway operations within this system depends on several factors, the role of the human is crucial and increasingly recognized as such (Hollnagel, 1998;Priestley and Lee, 2008).A large number of railway accidents, both in operational as well as maintenance procedures, occur due to degraded human performance (Dhillon, 2007), which is described as the human capabilities and limitations that have an impact on the safety and efficiency of operations (Maurino, 1998).Indeed, the personnel performing maintenance tasks are confronted with a set of error producing conditions (EPCs) within rigorous railway maintenance systems, which can degrade their performance.Such EPCs include: time pressure, negligible feedback, confined work spaces, awkward body positions (e.g.bent and/or twisted backs, both arms above the shoulder), poorly written procedures and the lack of access to the required equipment.These conditions, typically in combination, result in various forms of errors and consequently failure and accidents.Surprisingly, human error in railway maintenance has not been given the sufficient attention it deserves in the research, even though Shapero and his colleagues as far back as 1960 (Shapero et al., 1960) highlighted that human error is responsible for 20-50% of equipment failures.One consequence has been a number of high-profile railway accidents due to human factors-related maintenance problems, e.g.Sarbjeet singh et al.This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The Grayrigg train accident in the UK in 2007 when a train travelling from London to Glasgow derailed, resulting in one fatality.This happened when a fault in the stretcher bar of the points caused the left and right switch rails to become disconnected (RAIB, 2007); • The 1998 accident on the German Inter City Express (ICE) at Eschede, where an eccentric wheel led to wheel tyre failure, causing several deaths.
Whilst it's nearly impossible to eradicate human error, it can be minimised through a good maintenance management plan and an understanding of the issues that affect errors (HSE, 2000).To have such an effective planit is a prerequisite to identify all the potential human errors, and then quantify their probability of occurrence by the appropriate statistical approach.Human reliability analysis (HRA) techniques offer an opportunity to do this as they aim to identify, quantify and reduce the likelihood of errors occurring within a system and thereby improve the overall levels of safety in this system (Kim et al., 2003).Such techniques have been used in a wide range of industries including the healthcare, engineering, nuclear, transportation and business sectors.Different HRA approaches such as THERP (Technique for human error rate prediction), ASEP (Accident sequence evaluation Programme) and HEART (Human Error Assessment and Reduction Technique) have been developed to predict the probability of human error.Originating in the nuclear safety industry, HEART is highly flexible and (Humphreys, 1995) notes its applicability over a wide range of areas.In particular, it is a task-based analysis (Kirwan, 1994) rather than a decompositional approach focusing on types of error.
In the UK, Rail Safety and Standards Board (RSSB, 2012) has introduced the Railway Action Reliability Assessment, to estimate Human Error Probabilities for railway operations, based upon HEART.However, this technique has not been applied to railway maintenance it is beyond the scope of this technique to provide a detailed list of factors that affect the performance of operators, e.g. it ignores the safety culture or the safety management of an organisation (RSSB, 2012, p. 15).
Amongst the most safety critical components of rolling stock is the railway bogie, comprising of the brake disc, wheel and frame.These important components must meet strict safety rules, in terms of the stopping distance associated with a maximum average deceleration, in all environmental conditions.Regular maintenance audits of such components are conducted, but these typically ignore the human error aspects of the maintenance tasks.Singh et al. (2014) and Singh and Kumar (2015) have previously analysed the potential maintenance human error of the disc brake unit and the wheel of the railway bogie, using HEART, based upon a case study in the railway maintenance workshop in Lulea, Sweden.This paper presents the integrated results of the follow up to these two previous studies by analysing the HEPs in the performance of maintenance tasks for the combined disc brake unit, wheel set and bogie frame in various error producing conditions in the same maintenance workshop.The HEART technique is again employed to determine the probability of human error occurring during each maintenance task and discuss implications relating to its use.The following sections provide a relevant summary of the two previous studies for the purposes of this paper.

Maintenance and Human Factors: Railway Bogie
Railway sector operate on gigantic dimensions covering over miles of distance, loading of million tons of freight, carrying million passengers.This sector played a vital role in the socio-economic development around the globe.As economy marches ahead the railway sector have also taken several technological and policy related initiates to meet the emerging challenges.While technological initiatives are directed towards improved utilization of assets and reduce human dependence but this has in turn transformed into a newer dimension of human interface.The human factor in railway safety has now become function of several additional factors making it more critical and complex than in the past.
Maintenance can be defined as a set of activities required to keep a system in "as-built" condition with its original productive capacity (Reason, 2000).Maintenance activities for rolling stock have a number of tasks that are prone to serious human errors, and any negligence in maintenance can result in accidents and a subsequent loss of lives.Human error, in general, can be defined as the failure to perform a specific task that could lead to disruption of scheduled operation or result in damage to property and equipment (Dhillon et al., 2006).Dhillon (2002) has claimed that maintenance error is linked to incorrect repair; further, the occurrence of maintenance errors rises with increased maintenance frequency.Hence understanding the root causes of errors is the first stage in managing the maintenance human error.The objective is to identify potential factors causing the overall effect in order to reveal key relationships among various factors, and the possible causes offer an additional insight into process behaviour (Singh and Kumar, 2015).In the case study analysed in this paper, the causes of maintenance error have been derived from group discussions among technicians, supervisors and academic experts.Ten group brainstorming sessions were eventually performed, with each of them to last approximately two hours.Nine experts, i.e., five technicians, two supervisors and two academics, participated in the sessions.Among them, the maintenance technicians were between 42-45 years old, their height and weight ranged between 178-190 cm and 75-85 kg respectively, while their working experience between 20-25 years.The supervisors' working experience was 11 to 12 years, while the two academic experts each held a PhD in railways maintenance.None of the experts participating in this study had a history of chronic or acute illness, hypertension or any other major health issues, and no one was taking any prescribed medication prior to or during the period of our study.
Based on the findings from the brainstorming sessions, the authors constructed the most relevant cause categories.These sessions involved a set of questionaries' followed by technical discussions.In general human errors in railway maintenance include: disassembly errors, inspection errors, maintenance errors, assembly errors and installation errors.
The causes for human errors in railway maintenance can be categorized in four main groups: i) work place design and environmental factors, ii) maintenance tasks factors iii) subject factors iv) organization factors.
The consequence of human errors in maintenance results in making incorrect decisions, incorrect actions, incorrect checks, or, conversely, correct checks on the wrong object.
The railway maintenance workshop of our case study in Luleå, Sweden, uses two types of maintenance audit programs for a railway bogie: R1 and the more detailed R4.
The former is an audit conducted after a maximum of 1,200,000 km, whereas the latter is a more detailed audit conducted after 3,600,000 km.The R1 maintenance audits for railway bogies correspond to the detection, monitoring and repair of disc brake unit assembly, wheel set and bogie frames.The potential human errors associated with such maintenance activities are discussed briefly in the next sections.In the present study, the human error probability of the railway bogie consisting of disc brake assembly unit, wheel set assembly and bogie frame has been evaluated.The disc brake assembly unit on the bogie has four brake packages with braking motion and a brake unit The latter consists of a brake actuator integrated with a brake controller.The wheel set has two brake discs; each is associated with a specific lever in the brake unit.Careful and regular maintenance is required to ensure even distribution of forces to all wheels.Badly set up rigging will cause wheel flats or lead to inadequate brake force Brain-storming sessions in railway maintenance workshops revealed that poorly executed maintenance tasks on the disc brake assembly unit, such as improper lubrication of the brake disc, undersize fitting of the brake block, tapping screws and cylindrical bolts can cause serious errors.Moreover, incorrect measurement of brake movement results in a delay in brake lever movement, thereby reducing brake performance.This affects the distribution of braking forces from a brake cylinder to the wheels on the vehicle.In the railway bogie, wheel-set is the wheel-axle assembly of a railroad car.During maintenance, the maintenance technicians perform visual inspections (maintenance type R1) of the wheel set, axle mounted brake disc and bearing box.Visual inspection of an axle mounted brake disc and wheels can identify cracks and surface imperfections, but these manual tasks may increase the probability of human error.Moreover, wheel profile measurement, such as fin height, flange thickness, diameter of running circle, limit to turning, and Qr (flange slope), is also done manually, escalating the probability of human error and possibly resulting in wheel-set misalignment and increased fuel consumption.The bogie frame is springloaded and guided in the wheel set by rubber elements (Figure 2) placed between the horn blocks and axle box.The stiffness of the rubber material is selected so that the wheel pair has the desired mobility in all directions relative to the bogie frame.The two spring groups are placed on each side of the bogie.Each spring group consists of two spring assemblies with inner and outer coil springs in which a yoke is mounted.
The yokes are connected via tie rods which transmit tensile and compressive forces from the basket to the bogie via primary suspension and a spindle.The yokes are bolted to the basket.In R1 maintenance, the bogie frame, bolster and yoke are inspected for possible cracks, deformation and scuff marks.There is visual inspection of cracks with a focus on welds in the cross member to brake packages and welds in the brackets for tie rods.Manual cleaning of the pandel lanks and inspection of the dampers (vertical, horizontal and gear) is included in R1 maintenance of the bogie frame.All dampers (vertical, horizontal and gear) are replaced.Care is taken to prevent the leakage of oil from the damper as this reduces its performance.The pandel lank which helps in adjusting the height of the bogie is thoroughly cleaned to remove any foreign matter.

HUMAN RELIABILITY ANALYSIS OF RAILWAY BOGIE
Nine participants were monitored while executing their maintenance tasks and questioned during the execution and completion of tasks related to maintenance of a bogie (Figure 1a).The maintenance tasks were initially defined followed by a detailed analysis of the bogie components in order to identify potential human errors that could cause system failures, leading to the development of a relevant risk model, Fugure 2. In this study we used the Human Error Assessment and Reduction Technique (HEART) (Williams, 1988) to evaluate the probability of a human error occurring throughout the completion of a specific maintenance task.
HEART suggests that every time a task is performed, there is a likelihood of failure and the associated probability of this failure is affected by one or more error producing conditions (EPCs), for instance the shortage of time or inexperience.This technique incorporates the most widely used estimates of error rates of generic tasks.There are 9 Generic Task Types (GTTs) described in HEART and each is associated with a nominal human error probability (HEP).In addition, there are 38 Error Producing Conditions (EPCs) that may affect reliability, each with a maximum amount by which the nominal HEP can be multiplied.In this study we selected the GTT F, which refers to the "restore or shift a system to original or new state following procedures, with some checking".
Figure 2. Human Reliability Assessment Process (Adapted from Kirwan, 1994) The proposed nominal human unreliability for this GTT equals 0.003 based upon Williams' (1988) analysis.The brainstorming sessions helped to identify the human activities that may lead to a potential system failure.These sessions also helped to build a fault tree in order to determine the undesired events that could lead to t h e failure.The tasks related to the maintenance of the wheel, disc brake unit and frame were identified and examined in detail and the information reviewed from the perspective of risk analysis of the system.These tasks were then grouped into disassembly tasks, inspection tasks, maintenance tasks, assembly, installation and testing tasks.Each was further divided into potential errors in subtasks, such as D1, D2, D3, D4 (for disassembly), M1, M2, M3, M4, M5 (for inspection and so on; see Table 1), and the HEART nominal human reliability values, as given by Williams (1988) were assigned to each task.The HEART HEPs were evaluated by applying error producing conditions and the engineer's proportion of affect (EPOA) (Williams, 1988).EPOA ranging from 0-1 was assigned to each task by two maintenance supervisors.It has been observed that in many maintenance tasks of railway bogie there were more than one error producing conditions.These EPCs are then selected and applied to calculate the HEP using Eq. ( 1) The error producing conditions (EPC) were considered and applied to each task by an expert panel.

Ai GTT HEP
Where, GTT is human error associated with each generic task Table 1 shows the HEPs of each sub-task associated with the maintenance of whole railway bogie (disc brake unit, wheel set assembly and bogie frame).The complete table including all the relevant tasks is provided in the Appendix 1.In this study it was assumed that every time a task is performed during maintenance, there is a likelihood of failure (Kirwan, 1994), this facilitated our evaluation of the probability of human error associated with each task and allowed deeper understanding of the impact of each individual task.Appendix 2 illustrates the fault tree developed to facilitate the analysis (Singh et al. (2014) &Singh andKumar, (2015).The events that result in the occurrence of the top event are connected and generated by logic gates AND and OR.The OR gate provides a true output (i.e., fault) when one or more of its inputs are True (fault).After analyzing maintenance tasks, the top fault event "M" (here in this case study, technician making an error while doing maintenance on the bogie) and possible causes or basic fault events (brake disassembly error, brake inspection error, brake maintenance error, brake installation and testing error, wheel inspection error, frame inspection and maintenance error) that cause the top event to occur were identified using the OR gate.A fault tree was then developed down to the lowest level (Appendix 2).The occurrence probability of the technician making an error (top event) was calculated using the probabilities of occurrence of basic fault evens (disassembly error, brake inspection error, brake maintenance error, etc (Appendix 2).In this study we assume that the input events occur independently, and the probability of occurrence of the OR gate output fault event is given by Eq. ( 2) (Dhillon, 1999): where P(y0) is the probability of occurrence of the OR gate output fault event, y0, k is the number of OR gate input fault event, and P(yi) is the occurrence probability of OR gate input fault event yi; for i = 1, 2, 3, …, k.While our approach provides individual estimates, it must be mentioned that there can also be dependencies between the different task steps.Such dependencies could exist, for instance, because two task steps are being carried out by the same individual or one task is being checked by a second person who may not be totally independent.Furthermore, one error may make a subsequent error more likely; or an error could be repeatedly made in a recurring process (RSSB, 2012).However, the need to consider dependency in this study is reduced since the GTTs are at a rather high level of task detail, and thus dependency for sub-tasks is considered in the HEP associated with the GTT (RSSB, 2012).The issue of dependency could also be addressed by ensuring that combined human actions in the fault or tree do not exceed established limits of human reliability, as described by Kirwan (1994).
The maintenance of the disc brake unit includes disassembly, measurement, and inspection, corrective maintenance, assembly, installation, and testing.
To calculate the probability of the top event the probabilities of occurrence of the disassembly error (DE), inspection error (BIE), maintenance/repair error (BME), assembly error (BAE), inspection error (IE) and testing error (TE) are calculated using the Eq.(3).For instance, the probability of occurrence of inspection error (BIE), can be calculated as follows  2) the probability of the event D, i.e., a technician commits an error while performing the maintenance on the brake disc unit, is 0.2093.Following the same rationale, the probability of the event M, i.e., a technician makes an error while conducting their maintenance tasks, is finally calculated as

MANAGEMENT OF HUMAN ERROR
It is pertinent to mention that the management of human error invloves not only investigation of past cases but also the improvement of the present situation to solve future problems in an organization (Grozdanovic and Stojilkovic 2006).The previous section outlined the method employed by which to assess the EPCs and the subsequent HEP calculations for tasks related to the maintenance of the rail bogie.Table 2 outlines the six most common EPCs in this study.This list can be effectively used as a checklist of EPCs when conducting the maintenance audits.

Table 2. Checklist of EPCs for maintenance of Railway Bogie
The top three EPCs were: over-riding information, shortage of time available and finally, the ability to detect and perceive.Each of these can provide by the basis of mitigation respectively as follows: provision of clear, self-explanatory manuals, scheduling sufficient time for maintenance tasks and by holding regular, targeted training and workshops for maintenance staff.Based on our findings, Figure 3 proposes a maintenance decision model to improve the overall quality of maintenance in the workshop.This model was verified by two maintenance supervisors (with more than 15 years of experience each).The use of this model will improve the quality of maintenance, enhance safety and lower maintenance costs; it will help management to explore and evaluate error producing conditions that adversely affect the performance of maintenance technicians.Currently, the two maintenance audits mentioned prior, R1 and R4, do not take into account any level of analysis human error, e.g.EPCs and hence the proposed maintenance decision model can in future be incorporated in these audits.

CONCLUSIONS
Railways have become an integral part of a nation's economy and the future growth of a nation relies increasingly upon a safe and efficient railway network.The importance of human performance and human error in ensuring the safety of railway operations has been increasingly recognised.This has led to considerable efforts to understand the factors underlying human error for train drivers, signallers and dispatchers in order to mitigate against them.However, whilst accidents can also arise from human error in railway maintenance activities, there is little in the research to help identify these factors and to subsequently assess the probability of human error.For maximum reliability, equipment must be kept in good working condition, and for this, regular maintenance is critical.A number of factors directly or indirectly result in a decline in human performance, leading to errors in maintenance tasks.Typically, maintenance workshops for rolling stock rely upon periodic maintenance audits to ensure the safety of maintenance, but such audits fail to explicitly account for human performance and human error.
We argue that the methodology shown in this paper to analyse human error probabilities in a railway maintenance system, specifically the safety critical railway bogie, can be used to bridge this gap.First it provides the relevant stakeholders with a robust universal methodology to better understand the role of humans in railway maintenance operations.Second, it can assess the performance of railway maintenance personnel in their workplace by using a well-known human reliability analysis method, HEART.The fact that experienced maintenance personnel provided the inputs and validated the outputs of the methodology lends credence to the methodology.
The proposed methodology can help maintenance management understand various error producing conditions and serve as an input to modify policies and develop better guidelines for railway maintenance tasks.At the operational level, this should enable railway maintenance organisations to develop robust solutions to enhance human performance by i) identifying and assessing the error producing conditions that mostly affect the performance of railway operators and in turn by monitoring human performance and ii) investing resources to mitigate the impact of these factors on human performance.Secondly, the list of the EPCs can be used as a checklist to enhance the current maintenance audits and monitoring these EPCs over time will also enhance the overall management of maintenance.
Miltos holds a PhD in Human Factors and railway safety from Imperial College London, and an MSc in Mechanical Engineering from ETH Zurich and a Diploma in Mechanical Engineering from the Aristotle University of Thessaloniki.Miltos is working in the Singapore-ETH Centre being involved in the Future Resilient Systems programme.
He is conducting research on human performance, human reliability, safety and resilience of critical infrastructure.
Prior to his appointment at FRS, Miltos was working at Delft University of Technology in the Human Factors of Automated Driving project.He was researching the legal and market perspectives of automated driving, as well as the impact of autonomous vehicles on road safety, performance and behaviour.

Figure 3 .
Figure 3. Proposed maintenance decision model

Table 1 .
Probability of human errors for individual tasksBased on the information included in Table2and by employing Eq. (