UC San Diego's Data Science Institute Is Teaching Machines to Think More Like Humans
How UC San Diego's Data Science Institute Is Teaching Machines to Think More Like Humans
A new generation of researchers is cracking the code on large language model behavior—and the implications could reshape artificial intelligence as we know it
![]() |
Mikhail Belkin |
In a gleaming laboratory at the University of California San Diego, Mikhail Belkin is doing something that sounds almost mystical: teaching artificial intelligence systems to think before they speak. His team at the Halıcıoğlu Data Science Institute (HDSI) has developed what they call a "control knob" for large language models—the powerful AI systems behind ChatGPT and Google Gemini that can write poetry, debug code, and answer complex questions with startling fluency.
But there's a catch. These AI marvels, for all their impressive capabilities, can be wildly unpredictable. They might generate biased content, spread misinformation, or exhibit toxic behavior with little warning. "It's like having a brilliant but erratic student," explains Belkin, a professor whose previous work helped establish the theoretical foundations of modern machine learning. "They can produce remarkable insights one moment and completely nonsensical or harmful outputs the next."
Peering Inside the Black Box
The breakthrough came through what Belkin calls "nonlinear feature learning"—a technique that allows researchers to identify and manipulate the underlying features that drive an AI system's responses. Think of it as understanding not just what ingredients go into a cake, but how each ingredient interacts with the others during baking.
Traditional approaches to AI safety have focused on filtering outputs after they're generated—essentially putting a mask over problematic responses. Belkin's team took a different approach, diving deep into the neural network's internal architecture to understand how decisions form before words appear on screen.
"We're gaining a deeper understanding of the AI's internal thought process," Belkin says. The technique has already shown promise in steering language models away from harmful outputs while preserving their creative and analytical capabilities.
Silicon Valley Comes to Campus
The work hasn't gone unnoticed by industry giants. In May 2025, NVIDIA—the chip company powering much of the AI revolution—donated a DGX B200 system to HDSI, one of the most powerful AI computing platforms available. "Leading companies like OpenAI, Meta, and xAI rely on tens of thousands of such GPUs to train models like Llama and ChatGPT," notes Hao Zhang, who leads the institute's MLSys Group. "This gift puts UC San Diego on the map as a place where world-class AI research can happen—not just in theory, but in practice."
The donation reflects a broader trend of tech companies recognizing universities as crucial partners in solving AI's thorniest problems. HDSI's industry partnership program has attracted collaborations with Intel, Viasat, and healthcare technology company Dexcom, creating a pipeline where academic research directly informs real-world applications.
Beyond the Laboratory Walls
What makes HDSI unusual isn't just the quality of its research, but its commitment to understanding AI's societal implications. The institute's curriculum doesn't just teach students to build better algorithms—it emphasizes what director Rajesh Gupta calls "awareness": the responsibility to consider how these powerful tools will reshape society.
"We must educate our students to not only be skilled in optimization methods, but also whether such an optimization should even be an objective, and under what guardrails," Gupta explains. It's a philosophy born from the recognition that today's computer science students will be tomorrow's architects of artificial intelligence.
The approach seems to be working. Since its founding in 2018, HDSI has grown to encompass nearly 5,000 students and has graduated more than 700 alumni now working in roles ranging from machine learning engineers to data analysts across Silicon Valley and beyond.
The Next Frontier
As large language models become increasingly sophisticated—and increasingly integrated into everything from search engines to medical diagnosis—the need for fine-grained control becomes more urgent. Belkin's team has made their code publicly available, encouraging other researchers to build on their work.
The implications extend far beyond academic laboratories. In an era where AI systems help make decisions about loan approvals, medical treatments, and criminal justice, the ability to understand and guide their behavior could determine whether artificial intelligence becomes humanity's greatest tool or its most dangerous creation.
"As LLMs become increasingly integrated into our daily lives, being able to understand and guide their behavior is paramount," says Gupta, who also serves as interim dean of UC San Diego's newly formed School of Computing, Information and Data Sciences.
The researchers at HDSI aren't just building better AI—they're working to ensure that as machines become more powerful, they also become more predictable, more controllable, and ultimately more aligned with human values. In a field where the pace of change often outstrips our ability to understand its consequences, that may be the most important research of all.
The Halıcıoğlu Data Science Institute continues to accept applications for graduate programs in data science, with research opportunities spanning artificial intelligence, machine learning systems, and computational social science.
Advancing Large Language Model Research Through Academic-Industry Collaboration: The Halıcıoğlu Data Science Institute Model
Abstract
The Halıcıoğlu Data Science Institute (HDSI) at the University of California San Diego has emerged as a leading force in large language model (LLM) research and academic-industry partnerships since its establishment in 2018. As part of the newly formed School of Computing, Information and Data Sciences (SCIDS), HDSI has pioneered innovative approaches to LLM development, safety, and application while fostering robust commercial collaborations. This article examines HDSI's groundbreaking research in neural network control systems, feature learning methodologies, and the institute's strategic partnerships with industry leaders including Intel, Viasat, and Dexcom. Through its interdisciplinary approach and commitment to ethical AI development, HDSI has positioned itself at the forefront of addressing society's most pressing challenges through data science and artificial intelligence.
Introduction
The rapid advancement of artificial intelligence and large language models has created an unprecedented demand for research institutions that can bridge the gap between theoretical computer science and practical applications. The Halıcıoğlu Data Science Institute (HDSI), under Gupta's leadership, has become an innovative force that pushes the limits of a rapidly growing field by bringing together an interdisciplinary team of researchers from areas ranging from computer science to communications, medicine to philosophy. Founded in 2018 with philanthropic support from UC San Diego alumnus Taner Halıcıoğlu, HDSI has grown to encompass 4,800 undergraduate, masters and doctoral students each year, boasts nearly 50 faculty and 15 postdocs, and has more than 700 alumni working in roles that range from machine learning engineers to data analysts.
The institute's integration into the newly established School of Computing, Information and Data Sciences (SCIDS) in 2024 represents a strategic consolidation of UC San Diego's computational resources. SCIDS will bring together faculty across disciplines to improve the human condition by better understanding how data shapes society, and to prepare the next generation of highly skilled workers driving artificial intelligence advancements. This organizational evolution positions HDSI to leverage both the computational infrastructure of the San Diego Supercomputer Center and its own expertise in data science education and research.
Large Language Model Research at HDSI
Breakthrough Research in Neural Network Control
HDSI researchers have made significant contributions to the fundamental understanding and control of large language models. Mikhail Belkin, a professor with UC San Diego's Halıcıoğlu Data Science Institute (HDSI) – part of the School of Computing, Information and Data Sciences (SCIDS) – has been working with a team that has done just that. Specifically, the researchers have discovered a method that allows for more precise steering and modification of large language models (LLMs).
The team's novel approach addresses one of the most critical challenges in LLM deployment: predictability and safety. "Currently, while LLMs demonstrate impressive abilities in generating text, translating languages and answering questions, their behavior can sometimes be unpredictable or even harmful," Belkin said. "They might produce biased content, spread misinformation or exhibit toxic language."
Their breakthrough methodology centers on a novel "nonlinear feature learning" method. This technique allowed them to identify and manipulate important underlying features within the LLM's complex network. Think of it like understanding the individual ingredients in a cake rather than the final product. By understanding these core components, the researchers then guided the AI app's output in more desirable directions.
The research team, which includes Belkin, Daniel Beaglehole from UC San Diego's Computer Science and Engineering Department, Adityanarayanan Radhakrishnan from the Broad Institute of MIT and Harvard SEAS, and Enric Boix-Adserà from MIT Mathematics and Harvard CMSA, has made their findings publicly available to encourage further development in AI safety and control.
Theoretical Foundations and Feature Learning
Professor Belkin's extensive research portfolio demonstrates HDSI's commitment to understanding the theoretical underpinnings of modern AI systems. Belkin's research has focused on fundamental problems in machine learning including understanding how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction—a process called feature learning.
The institute's approach to LLM research encompasses both theoretical advancement and practical application. Belkin's recent work has been concerned with understanding the remarkable mathematical and statistical phenomena observed in deep learning, particularly feature learning and over-parameterization in deep learning. This foundational research is crucial for developing more reliable and interpretable AI systems.
Interdisciplinary Applications and Workshops
HDSI has established itself as a hub for cutting-edge LLM research through its organization of specialized academic events. The UCSD HDSI-TILOS "LLM Meets Theory" Workshop will bring together prominent researchers to discuss the future of mathematical and scientific theory and large language models (LLMs). These workshops serve as crucial forums for advancing theoretical understanding of LLMs while addressing practical implementation challenges.
The institute's research extends beyond traditional computer science boundaries. Faculty member Zhiting Hu has secured significant federal funding for AI research applications in defense contexts, demonstrating the institute's capability to address national security challenges through advanced AI research.
Commercial Partnerships and Industry Collaboration
Strategic Industry Partnership Model
HDSI has developed a comprehensive approach to industry collaboration that goes beyond traditional university-corporate relationships. HDSI has an active industry relations program designed to facilitate impactful collaborations around talent, student projects, data sharing, research, faculty interactions, as well as continued education and leadership in data science.
This strategic approach has attracted major technology companies seeking to leverage HDSI's research capabilities while providing students with real-world experience. The institute's partnership model focuses on mutual benefit, offering companies access to cutting-edge research and talented students while providing the academic community with industry insights and resources.
Intel Partnership: A Model for Deep Collaboration
The partnership between HDSI and Intel exemplifies the institute's approach to meaningful industry collaboration. This evolving partnership has now expanded to six HDSI faculty and nine students engaged across seven projects in collaboration with 11 Intel staff members. The relationship extends far beyond simple internship programs or one-off consulting arrangements.
Bradley Voytek, an HDSI faculty member, emphasizes the dynamic nature of data science and the importance of industry partnerships: "Data Science is always changing," says Voytek. "New technologies, new methods, algorithms and bigger data -- by the time our first-year Data Science undergraduates graduate from UC San Diego, half of the data science tech stack will have changed. A lot of this change is being spurred on by industry leaders, such as Intel".
The Intel collaboration provides students with unprecedented access to industry-scale data and mentorship. The students have access to data and interactions with senior level leaders who will rely on HDSI student recommendations. The senior leadership team at Intel is mentoring students through project-based teamwork experiences, holding weekly and sometimes daily interactions with Intel subject matter experts and HDSI faculty.
Viasat: Founding Industry Partnership
Viasat's commitment as a Founding Industry Partner demonstrates the value proposition that HDSI offers to established technology companies. As a Founding Industry Partner, Viasat will make a contribution to the HDSI, helping to shape the educational degree programs, as well as assist in the development of translational programs ranging from short-term training programs to longer-term engagements with industry developing data science-engineered solutions.
The partnership model allows companies to influence curriculum development while gaining access to emerging talent. Viasat hope to highlight new opportunities for data scientists, data engineers and experts with important foundational knowledge in such powerful tools as machine learning and artificial intelligence, as well as provide the necessary skills and thinking to extract critical insights from mountains of data to make a positive impact on society and the economy.
Healthcare Industry Engagement: Dexcom Partnership
HDSI's expansion into healthcare technology partnerships illustrates the versatility of its collaboration model. Dexcom Inc., headquartered in San Diego, CA, empowers people to take control of diabetes through innovative continuous glucose monitoring (CGM) systems. The partnership with Dexcom demonstrates how HDSI's data science expertise can be applied to critical healthcare challenges.
Mark Derdzinski, Manager of Data Science at Dexcom, noted: "Dexcom's data science team is excited to join the Board of HDSI industry partners in 2021. Our growing analytics teams look forward to actively engaging in the HDSI community as we build our talent pipeline". This partnership exemplifies how HDSI serves as a bridge between academic research and practical healthcare applications.
HDSI's expansion into healthcare technology partnerships illustrates the versatility of its collaboration model. Dexcom Inc., headquartered in San Diego, CA, empowers people to take control of diabetes through innovative continuous glucose monitoring (CGM) systems. The partnership with Dexcom demonstrates how HDSI's data science expertise can be applied to critical healthcare challenges.
Mark Derdzinski, Manager of Data Science at Dexcom, noted: "Dexcom's data science team is excited to join the Board of HDSI industry partners in 2021. Our growing analytics teams look forward to actively engaging in the HDSI community as we build our talent pipeline". This partnership exemplifies how HDSI serves as a bridge between academic research and practical healthcare applications.
Research Impact and Innovation
Educational Innovation and Workforce Development
HDSI's approach to education emphasizes both technical competency and ethical awareness. To teach data science, we realized we had to change our approach to curriculum and degree programs in fundamental ways. Traditionally, we look at our courses through the lens of what knowledge and which skills are we imparting. Data science adds a whole new dimension to this: "awareness." Awareness has many components: reproducibility, responsibility and generalizability of results.
This educational philosophy ensures that HDSI graduates are prepared not only with technical skills but also with the ethical framework necessary to deploy AI systems responsibly. The emphasis on awareness and responsibility is particularly relevant in the context of LLM development, where issues of bias, misinformation, and harmful content generation are of paramount concern.
Computational Infrastructure and Resources
The integration of HDSI into SCIDS provides access to world-class computational resources through the San Diego Supercomputer Center. The work relied on Expanse at the San Diego Supercomputer Center at UC San Diego and Delta at the National Center for Computing Applications at the University of Illinois; this was supported by NSF ACCESS. This computational infrastructure is essential for conducting large-scale LLM research and development.
A significant enhancement to HDSI's computational capabilities came in 2025 with NVIDIA's gift of a DGX B200 system to the MLSys Group and the Hao AI Lab, led by Hao Zhang, an assistant professor at the Halıcıoğlu Data Science Institute. Zhang described the DGX B200 as one of the most powerful AI systems available today, featuring extreme speed, large memory capacity for massive models, AI-ready design optimized for large language models and diffusion models, and plug-and-play functionality for researchers.
The DGX B200 system represents a transformative addition to UC San Diego's research infrastructure. "Leading companies like OpenAI, Meta, and xAI rely on tens of thousands of such GPUs to train models like Llama and ChatGPT. DGX B200 systems bring that class of compute to our university," Zhang said. "This gift puts UC San Diego on the map as a place where world-class AI research can happen — not just in theory, but in practice."
The availability of high-performance computing resources enables HDSI researchers to pursue ambitious research projects that would be impossible without significant computational support. This infrastructure advantage positions HDSI to compete with the best-resourced research institutions worldwide.
Future Directions and Implications
Scaling Academic-Industry Collaboration
HDSI's success in developing meaningful industry partnerships provides a model that could be replicated by other academic institutions. The institute's approach emphasizes long-term relationships, mutual benefit, and genuine collaboration rather than transactional arrangements. The HDSI Industry Relations team is looking to expand upon the mutual sharing of data, to replicate this model and develop a collaborative ecosystem where computing OEMs, value added resellers (VARs) and ISVs can share information and data, and experiment with new insights and findings.
Addressing Societal Challenges
The research conducted at HDSI has implications far beyond academic advancement. SCIDS will bring together faculty across disciplines to improve the human condition by better understanding how data shapes society, and to prepare the next generation of highly skilled workers driving artificial intelligence advancements. The institute's work on LLM safety and control directly addresses concerns about AI's impact on society.
As LLMs become increasingly integrated into various aspects of daily life, the need for research institutions like HDSI becomes more critical. "As LLMs become increasingly integrated into our daily lives, being able to understand and guide their behavior is paramount," said Rajesh Gupta, who is the interim dean for SCIDS, the HDSI founding director and a distinguished professor.
Conclusion
The Halıcıoğlu Data Science Institute represents a successful model for conducting cutting-edge research in large language models while maintaining strong partnerships with industry. Through its interdisciplinary approach, emphasis on ethical AI development, and commitment to practical applications, HDSI has positioned itself as a leader in the rapidly evolving field of artificial intelligence research.
The institute's integration into the School of Computing, Information and Data Sciences provides additional resources and institutional support for continued growth and innovation. As the demand for AI expertise continues to grow across industries, HDSI's model of combining rigorous academic research with meaningful industry collaboration offers a blueprint for addressing society's most pressing technological challenges.
The success of HDSI's approach to LLM research and industry partnerships demonstrates the value of institutions that can bridge the gap between theoretical advancement and practical application. As artificial intelligence continues to reshape society, institutions like HDSI will play a crucial role in ensuring that these powerful technologies are developed and deployed responsibly for the benefit of humanity.
References
1. BigDataWire. (2023, July 21). UC San Diego's Halıcıoğlu Data Science Institute: Pioneering the Future of AI and Big Data Education. Retrieved from https://www.bigdatawire.com/this-just-in/uc-san-diegos-halicioglu-data-science-institute-pioneering-the-future-of-ai-and-big-data-education/
2. UC San Diego Today. (2025). Steering AI: New Technique Offers More Control Over Large Language Models. Retrieved from https://today.ucsd.edu/story/steering-ai-new-technique-offers-more-control-over-large-language-models
3. Halıcıoğlu Data Science Institute. (2024). Language Models Research. Retrieved from https://datascience.ucsd.edu/tag/language-models/
4. UCSD HDSI-TILOS. (2024). LLM Meets Theory Workshop 2024. Retrieved from https://ucsd-hdsi-llm-workshop.github.io/2024/index.html
5. Halıcıoğlu Data Science Institute. (2024). Industry Partnerships. Retrieved from https://datascience.ucsd.edu/industry/
6. UC San Diego Today. (2021). Bringing Intel Inside. Retrieved from https://today.ucsd.edu/story/bringing-intel-inside
7. Viasat, Inc. (2019). Viasat Joins UC San Diego's Halicioğlu Data Science Institute as a Founding Industry Partner. Retrieved from https://investors.viasat.com/news-releases/news-release-details/viasat-joins-uc-san-diegos-halicioglu-data-science-institute
8. UC San Diego Today. (2021). HDSI Welcomes Dexcom. Retrieved from https://today.ucsd.edu/story/hdsi-welcomes-dexcom
9. UC San Diego Today. (2024). UC Regents Approve New School of Computing, Information and Data Sciences at UC San Diego. Retrieved from https://today.ucsd.edu/story/uc-regents-approve-new-school-of-computing-information-and-data-sciences-at-uc-san-diego
10. Wikipedia. (2025, March 6). UC San Diego School of Computing, Information and Data Sciences. Retrieved from https://en.wikipedia.org/wiki/UC_San_Diego_School_of_Computing,_Information_and_Data_Sciences
11. UC San Diego Today. (2024). UC San Diego's Newest School Merges Data Science and AI Innovation. Retrieved from https://today.ucsd.edu/story/uc-san-diegos-newest-school-merges-data-science-and-ai-innovation
12. School of Computing, Information and Data Sciences. (2024). Five Basic Frequently Asked Questions about SCIDS. Retrieved from https://scids.ucsd.edu/faq/index.html
13. San Diego Supercomputer Center. (2024). UC Regents Approve New School of Computing, Information and Data Sciences at UC San Diego. Retrieved from https://www.sdsc.edu/news/2024/PR20240718_SCIDS.html
14. ResearchGate. (2024). Mikhail Belkin's research works. Retrieved from https://www.researchgate.net/scientific-contributions/Mikhail-Belkin-9768164
15. Papers with Code. (2024). Mikhail Belkin. Retrieved from https://paperswithcode.com/search?order_by=stars&q=author%3AMikhail+Belkin
16. UC San Diego Today. (2025, May 23). UC San Diego Packs a Punch of AI Research Power with a Gift from NVIDIA. Retrieved from https://today.ucsd.edu/story/uc-san-diego-packs-a-punch-of-ai-research-power-with-a-gift-from-nvidia
Comments
Post a Comment