CSE5313: Coding and Information Theory for Data Science
Instructor: Netanel Raviv (netanel.raviv@wustl.edu)
TA in Charge: Junsheng Liu (junsheng@wustl.edu)
Time/Location: Tuesdays & Thursdays, 11:30am–12:50pm, Whitaker 218
Course Website: wustl.instructure.com/courses/155103
Piazza: piazza.com/wustl/fall2025/fall2025cse531301
Overview
Coding/information theory emerged in the mid-20th century as a mathematical theory of communication with noise. In recent decades, it has become a vast topic encompassing most aspects of handling large datasets. The course will begin with the classical mathematical theory and its basic communication applications, then continue to contemporary applications in storage, computation, privacy, machine learning, and emerging technologies such as networks, blockchains, and DNA storage.
Prerequisites
Prior knowledge in:
- Algebra (such as Math 309 or ESE 318)
- Discrete math (such as CSE 240 or Math 310)
- Probability (such as Math 2200 or ESE 326)
- Some mathematical maturity is assumed
Format
Lectures
- Tuesdays and Thursdays at 11:30am in Whitaker 218
- Lectures will not be recorded or streamed online
- Attendance and participation are highly encouraged
Exams
- Midterm: October 27th, 2025, 6:30–8:30pm, Louderman 458
- The midterm will contain written response questions.
- IMPORTANT: At the end of the exam, you must scan and upload your exam to Gradescope using your phone. Specific instructions will be given before the exam. Feedback will be provided only electronically; the hard-copy will not be returned.
Office Hours
- The instructor will hold a weekly office hour: Tuesdays, 1–2pm in McKelvey 3035.
- Students are encouraged to attend.
Homework Assignments
- 3–5 homework assignments submitted via Gradescope, with written-response questions
- A separate final assignment involving a research paper
- Students are encouraged to contact the instructor to discuss the choice of research paper; otherwise, a paper will be assigned
Preliminary List of Topics
- Mathematical background
- Channel coding, finite fields, linear codes, bounds
- Coding for distributed storage
- Locally recoverable codes, regenerating codes, bounds
- Introduction to Information Theory
- Information entropy, mutual information, asymptotic equipartition property, data compression
- Coding and privacy
- Information-theoretic privacy, secret sharing, multiparty computing, private information retrieval
- Coded computation
- Vector-matrix and matrix-matrix multiplication, Lagrange codes, gradient coding, blockchains
- Emerging and advanced topics
- Coding for DNA storage and forensic 3D fingerprinting
Textbooks
There are no formal reading assignments, but students are encouraged to use the following:
- Introduction to Coding Theory, R. M. Roth
- Elements of Information Theory, T. M. Cover and J. A. Thomas
Slides for every lecture will be made available online.
Announcements
All course announcements will be made in class or posted on the course website.
Course Grade
The final grade (0–100) will be determined as follows:
- Homework assignments: 50%
- Midterm: 25%
- Final assignment: 25%
Letter grades will be assigned according to the following table:
| Letter | Range | Letter | Range | Letter | Range | Letter | Range |
|---|---|---|---|---|---|---|---|
| A | [94, 100] | B− | [80, 84) | D+ | [67, 70) | D | [64, 67) |
| A− | [90, 94) | C+ | [77, 80) | D− | [61, 64) | F | (−∞, 61) |
| B+ | [87, 90) | C | [74, 77) | ||||
| B | [84, 87) | C− | [70, 74) |
Appeals
- Appeals must be submitted through Gradescope within 7 days of work being returned
- Provide a detailed explanation supporting your appeal
Late Days
- Each student has a budget of five late days for homework submissions
- Assignments are due by 8:59pm CDT on the due date
- Any part of a late day counts as a full late day
- No more than two late days can be used for any one homework
- You are responsible for tracking your late-day usage
- After using all late days, homework can only be late for medical or family emergencies
Collaboration and Academic Integrity
- Discuss problems with peers, but write your solutions on your own
- List all students you discussed each problem with and any significant external sources used in your submission
- Lack of necessary citations is a violation of policy
- You may not use any solution keys, guides, or solutions from previous classes, similar courses, or textbooks, however obtained
- No collaboration is allowed during exams
- Violations may result in failing the class and formal academic integrity proceedings
Clarification Regarding Generative AI Tools
- The use of generative artificial intelligence tools (GenAI) is permitted with restrictions
- Do not ask GenAI for complete solutions
- Permitted uses:
- Light document editing (grammar, typos, etc.)
- Understanding background information
- Seeking alternative explanations for course material
- Submission of AI-generated text is prohibited
- Beyond light editing, all submitted text must be written by the student
IMPORTANT:
- Every submitted assignment/project must include a “Use Of GenAI” paragraph summarizing any GenAI usage
- Failure to include this paragraph or including untruthful statements will be considered a violation of academic integrity
- The course staff reserves the right to summon any student for an oral exam regarding any submitted work, and adjust the grade accordingly.
- The oral exam will focus on explaining your reasoning (no memorization required)
- Students may be selected at random, not necessarily due to suspicion