Related Focus Area

Challenges / Needs from Amazon

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Amazon Researcher Name; Title; Contact Information
Relevant Publications; Websites; Videos
Sign up for Discussion Hour

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Amazon Researcher Name; Title; Contact Information
Relevant Publications; Websites; Videos
Sign up for Discussion Hour

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Amazon Researcher Name; Title; Contact Information
Relevant Publications; Websites; Videos
Sign up for Discussion Hour

Ideas / Solutions from Columbia

Cloud and Computing Systems for AI; AI Hardware Acceleration

We propose to design a system-on-chip (SoC) which recognize speech signals and converts it to text while consuming < 1 microwatt. Wirelessly transmitting text is orders of magnitude more power-efficient. The SoC, therefore, will provide the sought-after capability of natural interaction with humans to resource-constrained mobile and embedded devices which have no screen, no touchpad, nor enough resources to transmit large speech data wirelessly. Our group designed four ultra-low-power speech-recognition SoCs that represent the state of the art in the area. First, we have explored the hybridization of analog-mixed-signal and digital hardware for computing. We demonstrated the first 1-µW end-to-end voice activity detector (VAD) chip. Second, we have devised the algorithm-hardware co-design approach, such as data reuse, for a depthwise-separable convolutional neural network (CNN) chip and demonstrated keyword spotting (KWS) hardware that sets a new low-power record of 500nW. Third, we have created an event-driven spiking neural network (SNN) chip for speech command detection. The chip consumes <300-nW. Fourth, we have devised divisive-normalization feature extraction SNN hardware for background-noise-tolerant speech recognition. The chip maintains high recognition accuracy across a wide range of signal-to-noise ratios (SNRs) and noise types (e.g., traffic, cafe, train, etc) while consuming less than 500 nW for end-to-end keyword spotting. While those works have shown promising results, one of the key challenges remains, that is they can recognize only a small number of speech commands. In this project, instead, we will make a more general SoC which recognizes each phoneme of a speech and construct the text.

Mingoo Seok;
Fu Foundation School of Engineering and Applied Science;
Electrical Engineering;
ms4415@columbia.edu

Search and Information Retrieval; Sustainability

Interested in looking at means to monitor or model differential exposure to air-pollutants as a function of social economic status within urban areas. At present, such differential exposure can be complex and reflect urban heat islands, access to medical care, pollutants (lead, ozone); consequently, there is no known standard or metric for evaluation. But in addition any standard should also incorporate changing climate metrics; e.g., enhanced fire frequency and resultant pollutant particles. Such a metric would be useful in determining health risks and vulnerabilities-- especially for communities of color. IN turn, these standardized assessments can then be used to determine appropriate resources (e.g. medical) than can be used to compensate for the degree of risk.

Lewis Ziska;
Mailman School of Public Health;
Environmental Health Sciences;
lhz2103@cumc.columbia.edu

Sustainability

As aging of the US population accelerates, the number of older drivers continues to rise. In recent years, several studies have demonstrated that changes in driving performance and driving behaviors could be detected in older drivers with early-stage dementia and preclinical Alzheimer’s disease (AD) and that these changes may progress throughout the trajectory of AD. This project aims to employ AI models for early detection of dementia from driving trajectories. It will be based on LongROAD study sponsored by the AAA Foundation for Traffic Safety, which was designed to provide empirical evidence for understanding and meeting safe mobility needs of older drivers in the United States. It is a multi-site prospective cohort study of active drivers aged 65 to 79 years at the time of enrollment. Since 2013, a total of 2990 active drivers have been recruited and over 60 of them have been detected dementia. Naturalistic driving data for the first 30 months of follow-up, totaling 66.6 million miles, were acquired through an in-vehicle data recording device plugged into the OBDII port of each study participant's primary vehicle. Since 2019, the in-vehicle data recording device was replaced with a travel app, which uses different phone sensors and AI technology (e.g., a driver detection algorithm) to collect and process data for the study participants. Given a driver’s longitudinal monthly driving exposure measures, the goal of this project is to predict one’s probability of having dementia each month.

Sharon Di;
Fu Foundation School of Engineering and Applied Science;
Civil Engineering & Engineering Mechanics;
sharon.di@columbia.edu

Conversational AI/Natural Language Processing; Multisensory Multimodal AI; Fairness, Explainability, and Accountability in AI; AI for Human-Agent and Human-Robot Interaction; AI for Information Security; Cloud and Computing Systems for AI; Operations Research and Optimization; Search and Information Retrieval; Information and Knowledge Management

Telemedicine and digital phenotyping are revolutionizing the field of Psychiatry from gaining deeper understanding of behavioral determinants of mental illness, to improving efficacy and timeliness of care. Mental health care providers are extremely eager to adopt cutting edge technologies. However, issues of privacy and equitable access to technology are a serious concern. My work, funded by the National Institute of Mental Health, focuses on developing valid markers of social activity and social functioning for patients living with severe mental illness, leveraging intense data streams from digital devices. I am employing causal machine learning approaches to understand the causal relationship of patients’ social activity with disease progression, with the goal of identifying behavioral targets of treatment. I am actively collaborating with researchers in leading institutions for Psychiatric care (McLean Hospital, New York State Psychiatric Institute, Montefiore Hospital, Nathan Kline Institute) to safely and effectively translating these methodologies in clinical settings. My primary concern is to conduct research that meets patients and health care providers preferences and needs so that data science solutions can be seamlessly incorporated in every-day care. Health care institutes need to be equipped to navigate the revolution of digital health care. The earlier potential challenges are identified and met, the earlier we will be able to provide equitable access to high quality care in Psychiatry.

Linda Valeri;
Mailman School of Public Health;
Biostatistics;
lv2424@columbia.edu

Cloud and Computing Systems for AI; Search and Information Retrieval

Predictive-Compression for Cloud Storage:

The overwhelming growth of digital data being uploaded and analyzed on the Cloud is imposing a real challenge of scalability for enterprise storage, file-systems and databases (e.g., AWS Elasticsearch). The redundancy of maintaining very large yet low-entropy datasets (with complex semantic similarities), leads to surging storage costs and energy waste in data warehouses and server farms. Indeed, current and traditional compression methods deployed in enterprise storage (e.g., Dedup and pattern-matching algorithms) provably fail to compress datasets with complex similarities, such as images, natural text and ecommerce catalogues, despite their overwhelming redundancies. 

We propose a novel, scalable approach to address this challenge, namely, the time-space tradeoff between compression and search in enterprise storage: This proposal will design I/O and CPU efficient  storage and information-retrieval algorithms for massive, unstructured data on the Cloud. Our approach and technology leverages recent advances in ML, NLP, Sketching and similarity indexing to design lossless statistical compression algorithms which dramatically reduces storage space and cost in data warehouses (up x2-x5 over state-of-art compression benchmarks in the industry), without compromising the CPU and latency of retrieval-time. We also design primary-storage indexing and data-summarization (sketching) algorithms for speeding up heavy database operations and search in non-sequel databases such as Elasticsearch. 

Omri Weinstein;
Fu Foundation School of Engineering and Applied Science;
Computer Science;
omri@cs.columbia.edu

AI for Information Security

Rating Phishing Websites:

It would be beneficial to develop “enhanced attribution” of phishers, or organized teams of phishers, and to assess the level of danger posed by their phishing campaigns aimed at ordinary users. This proposal is a step in that direction. This is accomplished by applying machine learning techniques to analyze the contents of the phishing websites the phishers have deployed. Each instance of a phishing website is valuable data from which an analysis may reveal the danger the site poses, and information about the phisher or phishing team who composed that site. The level of danger is a function of the amount and kind of sensitive personal information the site attempts to steal. The profiling of phisher behavior is useful as advanced threat intelligence as an aid in predicting whose website they may target next as a source of a spoofing and phishing campaign. Profiling of phisher behavior is accomplished by a focused analysis of the displayed input generated by the code the phisher has deployed across different websites. The model of phisher behavior may provide information that reveals motive and intent and may be useful to investigate organized phishing teams. Rating phishing sites may inform response strategies and provide more informed critical browser messaging to the user.

Sal Stolfo;
Fu Foundation School of Engineering and Applied Science;
Computer Science;
sjs11@columbia.edu