The paper presents a highly efficient way of simulating the dynamic behavior of deformable objects by means of the finite element method (FEM) with computations performed Cleaning / Dishwashing Aprons on Graphics Processing Units (GPU).The presented implementation reduces bottlenecks related to memory accesses by grouping the necessary data per node pairs, in contrast to the classical way done per element.This strategy reduces the memory access patterns that are not suitable for the GPU memory architecture.Furthermore, the presented implementation takes advantage of the underlying sparse-block-matrix structure, and it has been demonstrated how to avoid potential bottlenecks in the algorithm.To achieve plausible deformational behavior for large local rotations, the objects are modeled by means of a Cooker simplified co-rotational FEM formulation.