# A Note on the Complexity of Reliability in Neural Networks 

Piotr Berman* ${ }^{* \dagger}$ Ian Parberry* ${ }^{* \ddagger}$ Georg Schnitger ${ }^{* \dagger}$


#### Abstract

It is shown that in a standard discrete neural network model with small fan-in, tolerance to random malicious faults can be achieved with a log-linear increase in the number of neurons and a constant factor increase in parallel time, provided fan-in can increase arbitrarily. A similar result is obtained for a nonstandard but closely related model with no restriction on fan-in.


## 1 Introduction

One advantage that biological neural systems have over conventional computers is their ability to perform reliable computations with unreliable hardware. Carver Mead (quoted in [6]) has observed that:
"The brain has this wonderful property - you can go through and shoot out every tenth neuron and never miss them".

A plausible interpretation of this observation is that correct computations can be carried out with high probability when one of of ten neurons are destroyed at random.

We say that a circuit is reliable if it performs correctly with high probability in the presence of random faults. That is, if the neurons are damaged independently with low probability, then with high probability the circuit still computes the same function. We will show that discrete neural networks can be made reliable with a small increase in size and depth (at most a low-degree polynomial, and a constant factor, respectively).

The results in this paper can be re-expressed in a stochastic model in which the gates in the circuit are noisy in the sense that they fail independently with probability $\epsilon$. Similar work has been done on the simulation of standard circuits of fan-in 2

[^0]by reliable circuits of fan-in 2 (see von Neumann [16], Dobrushin and Ortyukov [3], Pippenger [13, 14], and Feder [4]). We apply Pippenger's techniques to the simulation of standard circuits of small fan-in $f$ by reliable threshold circuits of arbitrary fanin. We find that any threshold circuit of size $z$ and depth $d$ with small fan-in can be simulated by a reliable threshold circuit of size $O(z \log z)$ and depth $O(d)$. Our bounds are much cleaner and expose a fundamental relationship between the size and accuracy of the reliable circuit: whilst the size of the reliable circuit must be larger than that of the original threshold circuit by a log-linear amount, the accuracy of the reliable circuit (which is a measure of its reliability compared to the optimum) can be increased with only a further linear increase in size.

The problem of whether a similar result holds for the simulation of unbounded fanin threshold circuits by reliable threshold circuits is still open. Hajnal et al. [5] have a result of the correct nature, but with an increase in size that is superexponential in depth. We present a reliability result of the right flavour in a slightly different model, called a summation circuit. Summation circuits are technologically plausible and equivalent in resource usage to threshold circuits in an error-free environment.

The main body of this paper is divided into five short sections. The first section contains a description of threshold circuits. The second section contains a reminder of some important results from probability theory. The third section contains the reliability result for small fan-in threshold circuits. The fourth section contains a description of summation circuits. The fifth section contains the reliability result for summation circuits.

A preliminary version of the results in this paper was presented in a poster session at the First Annual Meeting of the International Neural Network Society in Boston, MA, in September 1988. An earlier version of the results in Section 4 appears in Parberry [9].

Throughout this paper, R denotes the set of real numbers, $\mathrm{R}^{+}$denotes the set of positive real numbers, $Z$ the set of integers, $N$ the set of natural numbers, and $B$ the Boolean set $\{0,1\}$. All logarithms are to base two unless otherwise indicated.

## 2 Threshold Circuits

A threshold function $f: \mathrm{B}^{n} \rightarrow \mathrm{~B}$ is defined by a set of weights $w_{1}, \ldots, w_{n} \in \mathrm{R}$ and a threshold $h \in \mathrm{R}$ as follows. If $x_{1}, \ldots, x_{n} \in \mathrm{~B}$,

$$
f\left(x_{1}, \ldots, x_{n}\right)=1 \text { iff } \sum_{i=1}^{n} w_{i} x_{i} \geq h
$$

Some examples of threshold functions include conjunction ( $w_{i}=1$ for $1 \leq i \leq n$, $h=n$ ), disjunction ( $w_{i}=1$ for $1 \leq i \leq n, h=1$ ), complement ( $n=1, w_{1}=-1$, $h=0$ ), and majority ( $w_{i}=1$ for $1 \leq i \leq n, h=n / 2$ ), A threshold gate is a device that computes a threshold function.

A threshold circuit is a finite layered circuit constructed without feedback from threshold gates. We assume no bound on the fan-out of the circuit (the number of
places that the output of each gate gets used), nor on the fan-in (the number of inputs to each gate). A threshold circuit with $n$ inputs and $m$ outputs computes a function with domain $\mathrm{B}^{n}$ and range $\mathrm{B}^{m}$ in the natural way. We will for the most part restrict ourselves to the case $m=1$. The size of a threshold circuit is the number of gates used. The depth is the number of layers. We assume that the circuit operates in parallel, so that the size is a measure of hardware requirements, and the depth is a measure of parallel time requirements.

Threshold circuits are a simple discrete neural network model that is well-studied in the literature. For example, it is known that for every threshold function there exists a set of weights $w_{1}, \ldots, w_{n} \in \mathbf{Z}$ such that $w_{i}=O\left(n^{(n+1) / 2} / 2^{n}\right)$ for $1 \leq i \leq n$ (Muroga, Toda, and Takasu [8]). We will, throughout this paper, assume that the weights are integers within this bound. It is well known that the weights can be made $\pm 1$ with only a polynomial increase in size and a constant multiple increase in depth (Parberry and Schnitger [11, 12]). Siu and Bruck [15] have improved the size and depth overheads in exchange for increasing the weights to a polynomial in the fan-in. For a survey of elementary results on threshold circuits, the reader can consult Parberry [9, 10].

## 3 A Result from Probability Theory

We will make great use of the following well-known result from probability theory. Suppose we make $m$ independent Bernoulli trials each with probability $\epsilon$ of failure. Let $B(k, m, \epsilon)$ denote the probability that at least $k$ trials fail.

Lemma 3.1 If $k \geq \epsilon m$, then $B(k, m, \epsilon) \leq(\epsilon m / k)^{k}((1-\epsilon) m /(m-k))^{m-k}$.
Proof: See Chernoff [2].
The following result from Angluin and Valiant [1] states that the probability of substantially more than the expected number of failures happening is exponentially small in the size of the sample.

Lemma 3.2 If $k=\epsilon m(1+\beta)$ for some $0 \leq \beta \leq 1$, then $B(k, m, \epsilon) \leq e^{-\beta^{2} \epsilon m / 2}$.
Proof: The proof follows from Lemma 3.1.

## 4 Reliable Threshold Circuits

Suppose $f: \mathrm{B}^{n} \rightarrow \mathrm{~B}$ is a Boolean function, and $C$ is a Boolean circuit. We say that $C$ fails to compute $f$ on input $x$ if the output of $C$ on input $x$ is not $f(x)$, and that $C$
fails to compute $f$ if it fails to compute $f(x)$ for some input $x . C$ is $(\mu, \epsilon)$-resilient on $f$ for some $0 \leq \mu+\epsilon \leq 1$ if the probability that $C$ fails to compute $f$ is at most $\mu+\epsilon$ when each of the gates of $C$ is damaged independently with probability at most $\epsilon$. Note that $\mu \geq 0$, since the output gate will be damaged with probability $\epsilon$. Intuitively, the $\epsilon$ term in the $\mu+\epsilon$ is the probability of harming the output gate, and $\mu$ is the probability of harming the rest of the circuit. Our aim is to minimize $\mu$.

We wish to be able to deal with a worst-case scenario in which damage to a gate may cause adversarial behaviour. That is, a damaged gate may behave in the worst possible fashion. We will assume no bound on the fan-in and fan-out of $C$, and that reliable inputs are available. The latter assumption is not crucial, and can be replaced by an assumption that the inputs can be repeatedly sampled with independent failure probability at most $\epsilon$.

Theorem 4.1 Every function computed by a threshold circuit of fan-in $f$, size $z$, and depth $d$ can be computed by a $(\mu, \epsilon)$-resilient threshold circuit with size

$$
\frac{4 z}{\epsilon \beta^{2}}\left(\ln z+\ln \frac{2}{\mu}\right)+1
$$

and depth $2 d+1$, for all $1 / 4(f+1) \leq \epsilon<1 / 2(f+1)$ and $\mu>0$, where $\beta=$ $1 / 2 \epsilon(f+1)-1$.

Proof: Let $C$ be a circuit of fan-in $f$, size $z$, and depth $d$. We construct a new circuit $C^{\prime}$ as follows. Each wire in $C$ is replaced by a cable, which consists of $m$ wires ( $m$ will be given explicitly later). Each gate in $C$ will be replaced by a circuit that has two input cables and an output cable. A wire $w$ in one of these cables will be called correct if it always carries the same value as the wire in $C$ that the cable replaces. A cable will be called correct if at most $\theta m$ of its wires are incorrect ( $\theta$ will be given explicitly later).

Let $g$ be a gate in $C$ with inputs $x_{1}, \ldots, x_{f}$, and output $z$. The circuit corresponding to $g$ consists of two levels of gates. The first level consists of $m$ copies of $g$, with the $i$ th copy taking as input the $i$ th wire from each of the $f$ input cables. The second level of the circuit consists of $m$ majority gates, each of which has $m$ inputs, one from each of the copies of $g$. The outputs of these gates form the output cable for the circuit. Figure 1 shows the construction with $f=4$ and $m=6$.

Suppose that we damage each gate in $C^{\prime}$ independently with probability $\epsilon$, where $1 / 4(f+1) \leq \epsilon<1 / 2(f+1)$. We will analyze the probability that the output cable of a circuit corresponding to a gate is incorrect, assuming that its input cables are correct. Consider a circuit in $C^{\prime}$ corresponding to gate $g$ in $C$. Since its input cables are correct, at most $f \theta m$ of the copies of $g$ will be incorrect due to receiving a faulty input. In the worst case, it will take only a further $(0.5-f \theta) m$ faults in the copies of $g$ to make at least half of them incorrect. Therefore, the probability that more than


Figure 1: The reliable subcircuit corresponding to $g$.
half of the copies of $g$ are incorrect is $B((0.5-f \theta) m, m, \epsilon)$. The probability that the output cable is incorrect given that less than half of the first-level gates are incorrect is $B(\theta m, m, \epsilon)$. Therefore, the probability that the output cable is incorrect given that the input cables are correct is $B((0.5-f \theta) m, m, \epsilon)+B(\theta m, m, \epsilon)$. Therefore, taking $\theta=1 / 2(f+1)$, the probability that the output cable being incorrect given that the input cables are correct is, by Lemma 3.2, $2 B(m / 2(f+1), m, \epsilon) \leq 2 e^{-\beta^{2} \epsilon m / 2}$ where $\beta=1 / 2 \epsilon(f+1)-1$, provided $1 / 4(f+1) \leq \epsilon<1 / 2(f+1)$.

Since there are $z$ cables which may independently fail, and in the worst case the failure of a cable may result in the failure of the whole circuit, the probability that the cable representing the output of $C$ is incorrect is bounded above by $2 z e^{-\beta^{2} \epsilon m / 2}$. This is at most $\mu$ when

$$
m=\frac{2}{\epsilon \beta^{2}}\left(\ln z+\ln \frac{2}{\mu}\right) .
$$

Thus the output cable of the new circuit is incorrect with probability at most $\mu$. The circuit is completed by placing an $m$-input majority gate on the output cable. The probability that the output of this gate is incorrect is less than $\mu+\epsilon$. The total number of gates is $2 m z+1$, and the depth is $2 d+1$.

We can draw two conclusions from Theorem 4.1. Suppose we call the value $\mu^{-1} \in \mathrm{R}$ the accuracy of the circuit. Firstly, an arbitrary circuit can be made reliable under malicious faults with a log-linear increase in size. Secondly, accuracy can be increased to an arbitrary constant with a further increase linear in the original size.

Theorem 4.1 is only interesting when $\mu+\epsilon \leq 0.5$. Suppose we call a circuit $\epsilon$ resilient if it is $(\mu, \epsilon)$-resilient for some $\mu \in \mathrm{R}^{+}$such that $\mu+\epsilon \leq 0.5$. Taking $f=2$ and Carver Mead's example of $\epsilon=0.1$, we have:

Corollary 4.2 Every function computed by a threshold circuit of fan-in 2, size z, and depth $d$ can be computed by a 0.1 -resilient threshold circuit of size $63 z \log z+145 z+1$ and depth $2 d+1$.

If $\epsilon \leq 1 / 12$, our construction becomes even more practical.

Corollary 4.3 Every function computed by a threshold circuit of fan-in 2, size z, and depth $d$ can be computed by an $\epsilon$-resilient threshold circuit of size $48 z \log z+76 z+1$ and depth $2 d+1$, where $\epsilon \leq 1 / 12$.

Unfortunately, the general construction of Theorem 4.1 can only be used for Carver Mead's test case of $\epsilon=1 / 10$ provided $f \leq 4$.

## 5 Summation Networks

The obvious way to implement a threshold gate in VLSI is to partition the gate into two independent parts, one that selects and sums the weights, and one that performs the thresholding operation. (Actually, we will see that only the former is necessary). If we intend to implement discrete neural networks in a VLSI-based or related technology, it makes sense to analyze a theoretical model which more accurately reflects this partitioning. One can also observe the same kind of partitioning in biological neurons. The summation of current is performed at the dendrite branches and soma, and the thresholding is performed at the axon hillock.

A summation function $f: \mathrm{B}^{n} \rightarrow \mathrm{~B}^{n \log n}$ is defined by a set of weights $w_{1}, \ldots, w_{n} \in \mathrm{~N}$ as follows: $f\left(x_{1}, \ldots, x_{n}\right)$ is the two's complement representation of

$$
\sum_{i=1}^{n} w_{i} x_{i} .
$$

That is, $f\left(x_{1}, \ldots, x_{n}\right)=\left(y_{1}, \ldots, y_{m}\right)$, where

$$
\sum_{i=2}^{m} 2^{m-i} y_{i}-2^{m-1} y_{1}=\sum_{i=1}^{n} w_{i} x_{i} .
$$

Note that $y_{1}$ is a sign bit in the sense that $\sum_{i=1}^{n} w_{i} x_{i} \geq 0$ iff $y_{1}=0$. A summation gate is a device which computes a summation function.

A summation circuit is the analog of a threshold circuit using summation gates instead of threshold gates. The following two results show that summation circuits are closely related to threshold circuits. We will assume for the remainder of the paper that the output of the summation circuit is a single bit (that is, the other outputs are discarded).

Theorem 5.1 For every threshold circuit of size $z$ and depth $d$ there exists a sum-
mation circuit of size $z$ and depth $d$ which computes the same function.
Proof: Without loss of generality, we can assume that each threshold gate has a threshold value of zero (see, for example, Parberry [9]). Each threshold gate in the original circuit can then be replaced by a summation gate in the obvious fashion by discarding all of the outputs of the summation gate except for the sign bit.

Theorem 5.2 For every summation circuit of size $z$ and depth $d$ there exists a thresh-
old circuit of size $z^{3} \log ^{2} z$ and depth $3 d$ that computes the same function.
Proof: Hofmeister, Hohberg, and Köhling [7] have shown that $n$ integers of at most $n$ bits can be added in depth 3 and size $O\left(n^{2}\right)$. Therefore, every addition gate in the original circuit can be replaced by a threshold circuit of $O\left((z \log z)^{2}\right)$ gates and depth 3.

## 6 Reliable Summation Circuits

Define the multi-majority function $M: \mathrm{B}^{f m} \rightarrow \mathrm{~B}^{f}$ as follows. $M$ has $f m$ inputs, divided into $f$ groups of $m$. There are $f$ outputs, each of which is the majority of a group of $m$ inputs. More formally,

$$
M\left(x_{11}, x_{12}, \ldots, x_{1 m}, x_{21}, x_{22}, \ldots, x_{2 m}, \ldots, x_{f 1}, x_{f 2}, \ldots, x_{f m}\right)=\left(y_{1}, y_{2}, \ldots, y_{f}\right)
$$

where for $1 \leq i \leq f, y_{i}=1 \mathrm{iff}$

$$
\sum_{j=1}^{m} x_{i j} \geq m / 2
$$

Lemma 6.1 The multi-majority function can be computed by a summation gate.
Proof: Without loss of generality, assume $m$ is one less than a power of two. Suppose $m=2^{r}-1$ for some $r \in \mathbf{N}$. Then

$$
M\left(x_{11}, x_{12}, \ldots, x_{1 m}, x_{21}, x_{22}, \ldots, x_{2 m}, \ldots, x_{f 1}, x_{f 2}, \ldots, x_{f m}\right)
$$

is computed with a summation gate as follows (see Figure 2). The weights used in the summation gate are

$$
w_{11}, w_{12}, \ldots, w_{1 m}, w_{21}, w_{22}, \ldots, w_{2 m}, \ldots, w_{f 1}, w_{f 2}, \ldots, w_{f m}
$$

where for $1 \leq i \leq f, 1 \leq j \leq m, w_{i j}=m^{f-i+1}$. The summation gate with these weights has $f r+1$ outputs (the extra one is the sign bit)

$$
y_{0}, y_{11}, y_{12}, \ldots, y_{1 r}, y_{21}, y_{22}, \ldots, y_{2 r}, \ldots y_{f 1}, y_{f 2}, \ldots, y_{f r}
$$

where

$$
M\left(x_{11}, x_{12}, \ldots, x_{1 m}, x_{21}, x_{22}, \ldots, x_{2 m}, \ldots, x_{f 1}, x_{f 2}, \ldots, x_{f m}\right)=\left(y_{11}, y_{21}, \ldots, y_{f 1}\right)
$$

Theorem 6.2 Every function computed by a summation circuit of size $z$ and depth $d$ can be computed by a $(\mu, \epsilon)$-resilient summation circuit with size

$$
\frac{2 z}{\epsilon \beta^{2}}\left(\ln z+\ln \frac{2}{\mu}\right)+1
$$

and depth $2 d+1$, for all $1 / 8 \leq \epsilon<1 / 2$ and $\mu>0$, where $\beta=1 / 4 \epsilon-1$.


Figure 2: A multi-majority gate.

Proof: The proof is similar to that of Theorem 4.1. As before, each wire in the original circuit is replaced by a cable of $m$ wires. A cable is said to be correct if at least $m / 2$ of its wires are correct. Each summation gate $g$ of fan-in $f$ with weights $w_{1}, \ldots, w_{f}$ in the original circuit is replaced by $m$ copies of a two-gate circuit which consists of a multimajority gate whose output is fed into a copy of $g$, as shown in Figure 3. The probability that an output cable of this circuit is incorrect, given that the $f$ input cables are correct is $B(m / 2, m, 2 \epsilon)$. By Lemma $3.2, B(m / 2, m, 2 \epsilon) \leq$ $e^{-\beta^{2} \epsilon m / 2}$, where $\beta=1 / 4 \epsilon-1$. Using an analysis similar to that used in the proof of Theorem 4.1, the new circuit is $(\mu, \epsilon)$-resilient if we take

$$
m=\frac{1}{\epsilon \beta^{2}}\left(\ln z+\ln \frac{2}{\mu}\right) .
$$

Returning to our example of $\epsilon=1 / 10$, we have:

Corollary 6.3 Every function computed by a summation circuit of size z, and depth d can be computed by an $\epsilon$-resilient summation circuit of size $6 z \log z+14 z+1$ and depth $2 d+1$, for all $\epsilon \leq 1 / 8$.


Figure 3: The reliable circuit corresponding to $g$.

## 7 Open Problems

The problem of whether an analog of Theorem 6.2 holds for unbounded fan-in threshold circuits is still open. One weakness of Theorem 6.2 is that it uses large weights. However, the weights are bounded above by $2^{O(z \log \log z)}$ in a circuit of size $z$, which is certainly better than the best known upper bound on the weights in such a circuit, $2^{O(z \log z)}$ (Muroga, Toda, and Takasu [8]). The problem of fault-tolerance in summation circuits with small weights is still open.

## References

[1] D. Angluin and L. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing. ACM Press, 1977.
[2] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493-507, 1952.
[3] R. L. Dobrushin and S. I. Ortyukov. Upper bounds for the redundancy of selfcorrecting arrangements of unreliable functional elements. Problems of Information Transmission, 13:203-218, 1977.
[4] T. Feder. Reliable computation by networks in the presence of noise. IEEE Transactions on Information Theory, 35(3):569-572, 1989.
[5] A. Hajnal, W. Maass, P. Pudlak, M. Szegedy, and G. Turan. Threshold circuits of bounded depth. In 28th Annual Symposium on Foundations of Computer Science, pages 99-110. IEEE Computer Society Press, 1987.
[6] T. A. Heppenheimer. Nerves of silicon. Discover, 9(2):70-79, February 1988.
[7] T. Hofmeister, W. Hohberg, and S. Köhling. Some notes on threshold circuits and multiplication in depth 4. Information Processing Letters, 39(4):219-226, 1991.
[8] S. Muroga, I. Toda, and S. Takasu. Theory of majority decision elements. J. Franklin Inst., 271:376-418, May 1961.
[9] I. Parberry. A primer on the complexity theory of neural networks. In R. Banerji, editor, Formal Techniques in Artificial Intelligence: A Sourcebook, volume 6 of Studies in Computer Science and Artificial Intelligence, pages 217-268. NorthHolland, 1990.
[10] I. Parberry. Circuit complexity and neural networks. In P. Smolensky, M. Mozer, and D. Rumelhart, editors, Mathematical Perspectives on Neural Networks, Developments in Connectionist Theory. Lawrence Erlbaum Associates, To Appear in 1992.
[11] I. Parberry and G. Schnitger. Parallel computation with threshold functions (preliminary version). In Proc. Structure in Complexity Theory Conference, volume 223, pages 272-290, Berkeley, California, June 1986.
[12] I. Parberry and G. Schnitger. Parallel computation with threshold functions. Journal of Computer and System Sciences, 36(3):278-302, 1988.
[13] N. Pippenger. On networks of noisy gates. In 26th Annual Symposium on Foundations of Computer Science, pages 30-38. IEEE Computer Society Press, 1985.
[14] N. Pippenger. Invariance of complexity measures for networks with unreliable gates. Journal of the ACM, 36(3):531-539, July 1989.
[15] K. Y. Siu and J. Bruck. On the power of threshold circuits with small weights. SIAM Journal on Discrete Mathematics, 4(3):423-435, 1991.
[16] J. von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable components. In C. E. Shannon and J. McCarthy, editors, Automata Studies, pages 43-98. Princeton University Press, 1956.


[^0]:    *Research sponsored by the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under grant number AFOSR 87-0400.
    ${ }^{\dagger}$ Research supported by NSF grant CCR-8805978. Dept. of Computer Science, Penn State Univ., 333 Whitmore Lab., University Park, PA 16803.
    ${ }^{\ddagger}$ Dept. of Computer Sciences, Univ. of North Texas, P.O. Box 13886, Denton, TX 76203-3886.

