Couple of months back my elder brother (almost 48 yrs+ aged), asked me a question, which baffled me to write up this blog after researching into real world implementation of a Big Data solution for teaching platform.
His question was simple, It’s taking hell of a time to send his ~ 2GB size file of the videos he is making to teach his students. How can he manage his contents like document pdf files, video recording and wants an interactive platform which should record what he writes on the board, to be automatically prepared as a material for students to refer – post his classes anytime or later for study as a reference.
Hence, I have come up with below Big Data solution to build it into a viable product for any tutoring groups to make use of. Please feel free to augment or advice on the design and implementation details – a Reference Architecture for real world implementation from scratch comparing deployment on Google anthos (hybrid solution combined with on-Prem solution providers) and GCP (Public cloud). I also compare the hybrid and public cloud solutions for this teaching domain.
High Level Design:
1) Know your customer – Business goal ?
2) Small, medium or large customer? – Define Technologies for scalability
3) Define Business model – how are you going to make money?
– Single user money – discounts
– Subscription based – value for 1 yr contract with lower monthly pay – rewards to consumers (certificates)
4) Advertising & Sponsorship – provide data of consumer/client so that customer sponsors
– Marketing channels – emails, FB, youtube, and google Ads etc.
– Customer management and automation – First impression
– Transactional emails help build trust and help improve client’s side revenue
5) Value based subscription model (Attractive content costs money)
6) Content Assessment and quality
7) Identify the platform – Create a marketplace for customer
8) Personalization for customer
9) Start development – Minimum viable product (MVP) – core features only [POC]
– Validate your idea asap without much of spending on designs, features etc [Product Design workshops – Design thinking]
– Get Feedback – future development
– User acceptance – Plan future budget
– Estimate for development
10) Choosing the best Techstack
– Purpose : Build what prototype or enterprise level software (resilient one)
– Use case : Develop app which rely on ML or data science
– Architecture: Monolith (one app – many sub functionalities) / Micro-services (Services for different functionalities)
– Popularity : Identify Tech & SME’s (specialists) in that field who can help
Establish requirements, build solution and measure – repeat
– Backend : Python, Django, Go , PHP https://www.merixstudio.com/blog/backend-development/
– Frontend : https://www.merixstudio.com/work/u-project/
– Mobile :
cross-platform mobile development solution, using frameworks like React Native or Flutter.
11) Higher Best people – Product designers & Project managers
– Outsource if necessary
12) Core features (basic ones) of the product
a) Search tool : Category & filtering
b) Recommendation : Machine learning
c) Dashboard : Analytics
d) Course Page and Reviews
e) Reviews & Ratings
f) Payment systems : local and global payment methods
g) Customized Notifications
h) Admin Panel: For managing content and user accounts , generate statistics
Low Level Design :
1) Understand customer – customer growth over years – scalability – design architecture – proper technologies & tools
2) Budget : CAPEX, OPEX
3) Resources – skilled workers
4) Timeline – deadlines
5) Security of the data
6) Data ingestion : Lecture Videos from Mobile Laptops (Teachers) + Documents (word, pdfs) etc
7) Data processing: Classification of videos, documents, pdfs etc into topics
8) Data Consumption: End users (students) , subscription to topics contents.
9) 2 Services defined for Recorded replay, shared links of documents
10) 2 Services defined for Users (students – consumers) and (Teachers – producers)
11) 3 Services defined for Announcements, community interaction and Q&As.
12) Storage – Data Lake
13) 2 Load balancers, 2 gateways, 1 DNS, 1 DHCP, 1 Active directory security, 8 VMs for different services + 2 VMs for security storage and backup recovery. Total : 17 VMs
14) Provision for data growth yearly and users growth (20%)
15) Big Data tools: Kafka, Spark, Hive and Redis, Cassandra.
Compression ratio, Intermediate data, 70% data consumption, data and users growth quarterly and yearly.
Micro services defined for each independent service module :
16) Payment system : Gateway server and secure networking for payments
17) Reviews management system : Content server and associated Frontend-Backend service
18) Recommendation system : Service defined for recommendation – popular and personalized using Machine learning.
Brother’s use case : Small customer –
Service module (say Serv_gd) to keep his videos in Google drive – syncing from his mobile device. All the content be it pdfs, prescribe during his lecture and video – audio files of classes will have individual services to make it highly available spread across regions and datacenters and Racks. For this small use case caching servers may not be necessary, but to make this services consistent syncing data across the granular architecture.
Architect the front and backend to support above use case. Services may have to be defined for replay of content or use CDN’s for high availability, consistency and scalability, which is critical for business.