Artificial Intelligence and Data Science Quiz: Test Your Skills!
Artificial Intelligence (AI) and Data Science are fueling the technological evolution across all industries in the present times. Companies are relying heavily now on intelligent systems and data-capable insights to innovate, optimize processes, and inform decisions. As the competition for employment opportunities in AI and Data Science grows, employers expect a balance of both conceptual understanding and practical application. And for this purpose only we have designed the AI and Data Science quiz.
The AI and Data Science quiz has been created to support learners in developing their technical foundation in various facets of these domains. Besides, you will also be able to assess analytical reasoning skills, and prepare for interviews or certification programs. It includes important topics of study comprising Machine Learning, Deep Learning, Natural Language Processing, Model Evaluation, Data Preparation, Generative AI, MLOps, LLMs, and much more. This quiz delivers a balance of theoretical understanding with practical relevance.
So, if you are a student, practitioner in the field of Data Science and AI, an inquisitive learner, this quiz will certainly help you build confidence while handling the business problems, attempt case studies, improve problem solving abilities, and help you distinguish yourself in tech interviews where critical thinking and applied AI capabilities are differentiators.
1. Which metric is most appropriate to evaluate a binary classifier on an imbalanced dataset?
- Accuracy
- Precision-Recall AUC
- Mean Squared Error
- R2
2. In supervised learning, what does label leakage mean?
- Model overfit to the training features
- Target information is accidentally included as a feature
- Data pipeline hasn’t been versioned
- Test set labels are hidden
3. Which tool or skills is important for moving a model from prototype to production?
- Tableau
- MLOps (CI/CD monitoring)
- Excel pivot tables
- Manual batch scripts
4. Which programming language is most widely required for data science and ML roles?
- JavaScript
- Python
- R only
- PHP
5. What is “score matching” used for in modern generative models?
- Training the model to learn a denoising score function that approximates the gradient of the data distribution’s log probability
- Only matching model outputs to labels exactly
- Applying mean-squared error loss between random vectors
- Training solely with adversarial loss without score estimation
6. A resume lists “experience with LLMs.” What should hiring managers expect?
- Candidate can train LLMs from scratch
- Understands prompt design and responsible AI usage.
- Expert in all NLP tasks
- Can deploy GPUs in a datacenter
7. Which dataset split strategy best simulates future, real-world model performance?
- Random split (train/test)
- Stratified sampling only
- Time-based split when data is temporal
- Using all data for training
8. What complementary domain knowledge is most valuable for data scientists in healthcare?
- Finance accounting
- Clinical knowledge and regulatory understanding
- Video game design
- Culinary arts
9. What does feature engineering involve?
- Creating meaningful model inputs from raw data
- Building microservices
- Tuning network hyperparameters
- Designing UX dashboards
10. How is AI most likely to affect jobs according to recent studies?
- Replace 50% of jobs immediately
- Augment roles through hybrid tasks; retraining is common
- Affect only tech jobs
- No workplace effect
11. What is the main difference between bagging and boosting in ensemble learning?
- Bagging sequentially trains models, boosting trains independently
- Bagging trains models independently on random data subsets; boosting trains models sequentially focusing on previous errors
- Bagging only works for deep learning, boosting only for linear models
- Bagging reduces bias, boosting reduces variance
12. To maximize short-term salary growth in 2025, focus on:
- Historical statistics
- Building AI/ML and MLOps skills currently in demand
- Non-technical hobbies
- Typing faster
13. What is “prompt engineering” in the context of LLMs?
- Designing datasets
- Crafting model inputs strategically to shape output behavior
- Training models from scratch
- Building user interfaces
14. Which technique aligns AI model outputs with human preferences post-training?
- Batch normalization
- Reinforcement Learning from Human Feedback (RLHF)
- Dropout
- K-fold cross-validation
15. What is “few-shot learning”?
- Training with huge datasets
- Adapting with very few labeled examples
- Learning with no labels
- Transferring features across modalities
16. Transformers are best known for:
- Only image recognition
- Handling sequence data via attention mechanism
- Replacing CNNs
- Having few parameters
17. Major concern with generative AI in production:
- GPU memory only
- Hallucinations — generating false but plausible content
- Low output diversity
- Always deterministic outputs
18. What is “fine-tuning” in pre-trained models?
- Training from scratch
- Adjusting weights on a specific downstream dataset
- Only adjusting learning rates
- Freezing layers permanently
19. What is “zero-shot learning”?
- Performing tasks unseen during training
- Training with zero data
- Using zero GPU
- Only reinforcement learning
20. How can bias in ML models be reduced?
- Always enlarge the dataset
- Evaluate outcomes across subgroups and re-balance
- Ignore demographic data
- Remove feature engineering
21. Best metric for multi-class classification with imbalance:
- Accuracy only
- Macro vs. micro-averaged F1 scores
- R2
- Mean Absolute Error (MAE)
22. What does “model distillation” mean?
- Extracting water
- Manual overfitting
- Data augmentation
- Training a smaller ‘student’ model to mimic a larger ‘teacher’ model
23. What is “self-supervised learning”?
- Using labeled data only
- Pre-training on tasks where data creates its own labels
- Supervised with small labels
- Model supervising itself
24. Role of attention in transformers:
- Skip layers
- Weigh different inputs dynamically
- Reduce model size
- Hardware optimization
25. What is “chain-of-thought prompting”?
- Prompting the model to reason step-by-step logically
- Sequential predictions only
- Chained prompts
- For images only
26. What is “multimodal AI”?
- CPUs + GPUs
- Models that handle multiple data types (text, image, audio)
- Only image models
- A network of small models
27. Why is Explainable AI or XAI important?
- To reduce performance
- Required for trust, fairness, and regulation
- Always increase accuracy
- Only academic interest
28. What is “differential privacy”?
- Encrypt parameters
- Ensures individuals can’t be re-identified from outputs
- Reduces model size
- Only public data use
29. Tool for versioning datasets and models:
- WordPress
- Excel
- Git + DVC or MLflow
- Manual naming
30. What is “data drift”?
- Missing documentation
- Changes in data distribution over time
- Perfect accuracy
- Feature order errors
31. “Concept drift” refers to:
- Model architecture changes
- Target variable definition/distribution changes
- Static features
- Perfect labels
32. What is “edge AI”?
- Cloud-only
- AI on-device at the network edge (IoT, phones)
- For video games
- For robotics only
33. What’s “federated learning”?
- Model aggregation without sharing raw data
- Central server training
- Poor quality datasets
- Clustering
34. What are “adversarial attacks”?
- Friendly competition
- Inputs crafted to fool models
- Normal noise
- Underfitting
35. What is “data whitening”?
- Removing colors
- Transforming features to zero mean & unit variance
- Normalize magnitudes
- Only for audio
36. NLP deployment trend:
- Rule-based systems
- Retrieval-Augmented Generation (RAG) with LLMs
- Static bag-of-words
- Avoid embeddings
37. What is “RAG”?
- Generating data
- Retrieving external info to ground generation
- Augmenting data
- Regularization
38. Transformer scaling low:
- Relates model/data size and performance gains
- Only layer count
- Hardware only
- Shrinking models
39. Tokenization innovation:
- Word-by-word only
- Byte-Pair Encoding (BPE), SentencePiece
- Raw ASCII
- Remove tokenization
40. Vision Transformers (ViT)
- CNNs with layers
- Apply transformer design to image patches
- Object detection only
- Outdated models
41. What is “prompt injection”?
- Malicious input that alters AI behavior
- Missing prompts
- Only image models
- Hardware bug
42. What is “meta-learning”?
- One-task learning
- Learning to adapt quickly to new tasks
- Model ensembles
- Same as RL
43. What are “foundation models”?
- Task-specific models
- Large pre-trained adaptable models
- Built from scratch
- Vision only
44. “Scale-to-zero” in deployment means:
- Resources scale down to zero when idle
- Zero compute ways
- Permanent shutdown
- Misnomer
45. Benchmarking in ML:
- Paper comparisons only
- Systematic performance testing
- Models without data
- Irrelevant
46. What is “synthetic data”?
- Noisy data
- Artificially generated to supplement real data
- Unlabeled data
- Always poor quality
47. What is “data augmentation”?
- Reduce dataset size
- Generate modified data versions
- Tabular only
- Cleaning method
48. Lottery Ticket Hypothesis:
- Random training
- Small subnetworks can match full model performance
- Synthetic data
- Scheduling
49. Bias-variance trade-off:
- Overfitting only
- Balance complexity and generalization
- Simplify model
- Linear only
50. Transfer Learning:
- Adapt a model trained on one task to another
- Transfer hardware
- Small dataset reuse
- Equal classes
51. Regularization in ML:
- High learning rates
- Prevent overfitting (L1, L2, dropout)
- Reduce parameters only
- Deep learning only
52. Attention heatmaps help:
- Show unused inputs
- Visualize influential inputs for interpretability
- Only images
- Explain model size
53. Ensuring AI ethics in regulated industries involves:
- Ignoring transparency
- Privacy checks, fairness audits, documentation
- Only performance metrics
- Last-minute checks
54. What is “AI Hallucination”?
- Irrelevant factual content
- Plausible but false model outputs
- Refusal to answer
- 100% accuracy
55. Framework for tracking ML experiments:
- TensorFlow only
- MLflow, Weights & Biases
- Spreadsheets
- Word docs
56. Hyperparameter tuning:
- Auto architecture design
- Optimizing non-learned parameters
- Labeling datasets
- Preprocessing only
57. Open-source LLM trend:
- All proprietary
- High-quality open weight models for fine-tuning
- Only small ones
- Not useful
58. Foundation model risks include:
- Bias, misuse, privacy, energy cost
- Training cost only
- Always fair
- Hardware failure
59. Multi-task learning:
- Training a model for multiple related tasks
- Multiple datasets
- Sequential tasks only
- RL only
60. Token efficiency significance:
- Tokens irrelevant
- Compute cost per token critical to performance
- Training only
- Translation only
61. Peak GPU/TPU demand means:
- Constant demand
- Academic use only
- Planning for compute usage spikes
- Irrelevant in cloud
62. What is “Responsible AI”?
- Avoiding bad PR
- Legal compliance only
- Transparent, fair, safe, privacy-preserving systems aligned with human values
- Ethics papers only
63. What approach helps a generative AI model avoid make up facts (i.e. hallucinating)?
- Reinforcement learning guided by human feedback with grounding checks on external knowledge
- Only increasing dataset size without context grounding
- Freezing model weights permanently after first pre-train
- Reinforcement learning guided by human feedback combined with retrieval or grounding in verified knowledge sources
64. What is the principle behind “diffusion models” in generative modeling?
- They learn to iteratively denoise random noise through a forward-reverse diffusion process to generate coherent samples
- Randomly initializing weights without noise schedules
- Always training in one pass without steps
- Using deterministic functions with no stochastic process
65. What does “prompt chaining” achieve in GenAI applications?
- It connects multiple prompts where the output of one serves as input to the next, enabling multi-step reasoning or task completion
- Only sending a single prompt without structure
- Concatenating prompts without logical order
- Running completely independent prompts in parallel
66. What best describes the concept of Agentic AI in current research and industry discussions?
- AI models that only respond to user queries without taking initiative
- AI systems that can autonomously plan, reason, take action, and adapt to achieve complex goals with minimal human intervention
- Traditional supervised models trained purely on labeled datasets
- Chatbots that follow static, rule-based decision trees
67. What component allows Agentic AI systems to maintain context across multiple interactions?
- A persistent memory architecture that stores, retrieves, and updates contextual knowledge throughout multi-step reasoning
- Pretrained embeddings fixed after initial training
- Static prompts repeated for every query
- Random sampling of unrelated responses for variety
68. What does a neuron in a neural network primarily do?
- Transmits data between servers
- Applies a weighted sum and activation function to inputs to generate an output
- Stores model parameters permanently
- Generates random outputs for learning diversity
69. What is the purpose of activation function in a neural network?
- To store training data
- To introduce non-linearity and allow the network to learn complex patterns
- To reduce gradient updates
- To increase model size for large datasets
70. Which optimizer is commonly used to improve convergence in deep learning models?
- Linear Regression
- K-Means
- Adam (Adaptive Moment Estimation)