Deep Neural Networks (DNNs) demand significant computational resources, prompting the emergence of bit-slice architectures designed to efficiently accelerate DNNs by leveraging high bit-precision reconfigurability and fine-grained sparsity through slice-level computation. However, fully utilizing slice-level sparsity remains challenging, leading conventional bit-slice architectures to exploit either input or weight sparsity at a coarse-grained level. In this paper, we introduce a Bit-slice Architecture for DNN Acceleration (BADA) that simultaneously leverages both input and weight sparsity at coarse- and fine-grained levels. BADA features a novel architecture that skips computations for bit-slice chunks containing zero values and also skips any set of operands where either the input or weight bit-slice is zero. The design comprises a front-end unit responsible for generating bit-slices and selectively gathering only the non-zero slices, and a back-end unit equipped with a signed multiply-and-accumulate (MAC) unit to process these collected non-zero bit-slices. Additionally, we present two algorithmic optimizations to further enhance the efficiency and performance of BADA. First, we propose a novel bit-slice representation that supports 8-bit data without incurring additional hardware overhead, whereas conventional bit-slice representations are limited to 6-bit or 7-bit data under similar constraints. Second, we introduce a method to narrow the weight distribution during the training process, thereby increasing the proportion of zero-valued higher-order bit-slices and further enhancing slice-level weight sparsity. Experimental results demonstrate that BADA achieves a 2.67× increase in throughput, a 1.52× improvement in area efficiency, and a 2.15× enhancement in energy efficiency compared to LUTein, the state-of-the-art bit-slice architecture.