{"id":15927,"date":"2022-02-14T14:38:33","date_gmt":"2022-02-14T19:38:33","guid":{"rendered":"https:\/\/web.uri.edu\/cs\/?p=15927"},"modified":"2022-02-14T15:57:56","modified_gmt":"2022-02-14T20:57:56","slug":"dr-y-william-yu","status":"publish","type":"post","link":"https:\/\/web.uri.edu\/cs\/dr-y-william-yu\/","title":{"rendered":"Dr. Y. William Yu"},"content":{"rendered":"<h2>On the relationship between MinHash, Minimizers, and CNNs: an exploration through hash functions<\/h2>\n<h3>Date: February 25th, 2022 @ 3pm-4pm<\/h3>\n<h3>Location: <a href=\"https:\/\/goo.gl\/maps\/c6pJ7DAeL6o4J4SU7\">Center for Biotechnology and Life Sciences<\/a>. Room 100<\/h3>\n<h3>Host: Noah Daniels<\/h3>\n<div class=\"cl-tiles halves\">\n<div><a class=\"cl-button  \" href=\"https:\/\/www.utsc.utoronto.ca\/cms\/yun-william-yu\" title=\"\">Dr. Y. William Yu's Website<\/a> &nbsp;&nbsp;<\/div>\n<div><div class=\"cl-date  \" data-hash=\"fb004e488e5a08eb087d9b9b0e0bfba1\"><form method=\"post\" action=\"https:\/\/web.uri.edu\/cs\/wp-content\/plugins\/uri-component-library\/inc\/cl-ics.php\"><input type=\"hidden\" name=\"date_start\" value=\"20260225\"><input type=\"hidden\" name=\"summary\" value=\"Dr. Y. William Yu's Seminar\"><input type=\"hidden\" name=\"filename\" value=\"dr-y-william-yus-seminar.ics\"><input type=\"submit\" value=\"Add to Calendar\"><\/form><div class=\"cl-date-download-dialogue\"><div>Add to calendar?<\/div><div><div class=\"cl-date-download-cancel\">Cancel<\/div><div class=\"cl-date-download-confirm\">Add<\/div><\/div><\/div><div class=\"cl-date-content-wrapper\" title=\"Add to my calendar\"><div class=\"cl-date-content\"><div class=\"cl-date-download-indicator\"><\/div><div class=\"cl-date-month\">February<\/div><div class=\"cl-date-day\">25<\/div><\/div><\/div><div class=\"cl-date-caption-wrapper\"><div class=\"cl-date-caption\">Dr. Y. William Yu's Seminar<\/div><\/div><div class=\"cl-date-download-notice\">Check your downloads folder for<\/br>dr-y-william-yus-seminar.ics<\/div><\/div><\/div>\n<\/div>\n<div><section class=\"cl-wrapper cl-boxout-wrapper\"><div class=\"cl-boxout  \"><h1>Talk Description<\/h1><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n<p style=\"text-align: left\">The selection of subsampled k-mer features in strings is one of the primitive tasks enabling efficient genomics algorithms. These subsampled k-mers have been employed for everything from accelerated read mapping, approximate pairwise distances, and taxonomic classification. One of the classical means of selecting k-mers is to apply some hash function to the k-mers and keep only the minimum hashed values. When the hash function has the property of min-wise independence, the resulting algorithm is known as MinHash, which is a probabilistic sketch for computing the Jaccard distance between sets. When the hash function is applied on a sliding window of k-mers on a string, we get a canonical set of &#8220;minimizers&#8221; that can be used as a smaller set of anchors for matching similar strings. However, although related, MinHash and minimizers do not require the same properties from the hash functions used.<\/p>\n<p style=\"text-align: left\">In this talk, we discuss three loosely-related topics. First, we prove that we can use a floating-point encoding to reduce the space complexity of storing MinHash values from O(log n) to O(log log n), using the Flajolet-Martin trick of LogLog counters. Second, we show that selecting minimizer-like anchors for string matching in genomics may in fact not be best achieved through uniform random hashing; rather instead, we may want the k-mer selection method to have other properties than min-wise independence. Third and finally, we prove that convolution with a spherical Gaussian random variable results in a hash family that preferentially selects more distinct k-mers as extrema; this is not useful for designing classical hash functions because the output is non-uniform and continuous, but this fact allows us to interpret the initial layers of a convolutional neural network&#8212;the convolutional filters followed by max-pooling&#8212;as a minimizer-equivalent feature-selection operation.<\/p>\n<p style=\"text-align: left\">No knowledge of genomics will be assumed.<br \/>\nJoint work with Jim Shaw and Griffin Weber.<\/p>\n<p><\/p><\/div><\/section><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>On the relationship between MinHash, Minimizers, and CNNs: an exploration through hash functions Date: February 25th, 2022 @ 3pm-4pm Location: Center for Biotechnology and Life Sciences. Room 100 Host: Noah Daniels &nbsp;&nbsp;<\/p>\n","protected":false},"author":4601,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[85],"tags":[],"class_list":["post-15927","post","type-post","status-publish","format-standard","hentry","category-seminars"],"acf":[],"_links":{"self":[{"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/posts\/15927","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/users\/4601"}],"replies":[{"embeddable":true,"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/comments?post=15927"}],"version-history":[{"count":5,"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/posts\/15927\/revisions"}],"predecessor-version":[{"id":15951,"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/posts\/15927\/revisions\/15951"}],"wp:attachment":[{"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/media?parent=15927"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/categories?post=15927"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/web.uri.edu\/cs\/wp-json\/wp\/v2\/tags?post=15927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}