
Michelle Sweering (CWI)  String Sanitization under Edit Distance Supervisors: Solon Pissis and Leen Stougie Recorded full presentation Abstract Let W be a string of length n over an alphabet Σ, k be a positive integer, and S be a set of lengthk substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of S occurs in X_{ED}; (ii) the order of all other lengthk substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utilitypreserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of Σ. Our algorithm is based on a nontrivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n^{2δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 