Skip to Content
CSE5313CSE5313 Coding and information theory for data science (Lecture 1)

CSE5313 Coding and information theory for data science (Lecture 1)

Example problem

Get sum of original server data when any one of the servers is down.

-> Gradient coding for distributed machine learning.

Two shares give anything and on share gives nothing.

-> Scheme: Let p(x)=s+rxp(x)=s+rxf where rr is a random number. Share 1 is p(1)p(1) and share 2 is p(2)p(2). where ss is the final key. (The shamir secret sharing scheme)

Montage of topics

Storage system in the Wild

  • Huge amount of data worldwide.
  • Grows exponentially.
  • Must be reliably
  • Must be accessed easily

Challenge 1: reconstructing the data from subset of servers.

Challenge 2: repair and maintain the data.

Challenge 3: data access efficiency.

Privacy information retrieval

Retrieving data from a database without revealing which data is retrieved.

silly solution: download everything.

Coding for distributed computation

Problems in distributed computation:

  • Stragglers: 5% of the server is x5-x8 times slower than the rest.
  • Privacy of the data
  • Hardware failure
  • Adversaries: one adversary can corrupt the divergence of the computation.
  • Privacy of computation

Network coding

Passing information over network of nodes

DNA storage

GTAC

  • Super-dense
  • Super-durable
  • Non-volatile (no extra energy required to maintain the data)
  • Future proof (may be interested by future generations)

Challenges:

  • DNA synthesis is hard, only short strings (1000 nucleotides)
  • Very noisy.

Topics covered in this course

Part 1: Foundations

Foundations:

  • Finite fields, linear codes, information theory

Distributed storage:

  • Regenerating codes, locally recoverable codes

Privacy:

  • secret sharing, multi-party computation, private information retrieval

Midterm: (short and easy.. I guess)

Part 2: contemporary topics

Computation:

  • coded computation, gradient coding, private computation

Emerging/advanced topics:

  • DNA storage

Administration

Grading:

3-5 HW assignments (50%) Midterm (25%) Final assignment (25%)

  • For a recent paper: read, summarize, criticize and propose for future work.

Textbooks:

  • Introduction to Coding Theory, R. M. Roth.
  • Elements of Information Theory, T. M. Cover and J. A. Thomas.

The slides for every lecture will be made available online.

Clarification about Gen AI

The use of generative artificial intelligence tools (GenAI) is permitted. However, students are not permitted to ask GenAI for complete solutions, and must limit their interaction with GenAI to light document editing (grammar, typos, etc.), understanding background information, and to seeking alternative explanations for course material. In any case, submission of AI generated text is prohibited, and up to light editing, students must write all submitted text themselves.

IMPORTANT:

  • In all submitted assignments and projects, students must include a “Use Of GenAI” paragraph, which briefly summarizes their use of GenAI in preparing this assignment/project.
  • Failure to include this paragraph, or including untruthful statements, will be considered a violation of academic integrity.
  • The course staff reserves the right to summon any student for an oral exam regarding the content of any submitted assignment or project, and adjust their grade accordingly. The exam will focus on explaining the reasoning behind the work, and no memorization will be required. Students will be summoned at random, and not necessarily on the basis of any accusation or suspicion.

Channel coding

Input alphabet FF, output alphabet Φ\Phi.

e.g. F={0,1},RF=\{0,1\},\mathbb{R}.

Introduce noise: Pr(c receivedc transmitted)\operatorname{Pr}(c'\text{ received}|c\text{ transmitted}).

We use uu to denote the information to be transmitted

cc to be the codeword.

cc' is the received codeword. given to the decoder.

uu' is the decoded information word.

Error if uuu' \neq u.

A channel is defined by the three tuple (F,Φ,Pr(cc))(F, \Phi, \operatorname{Pr}(c'|c)).

Last updated on