Artificial Intelligence and Data Science Quiz: Test Your Skills!

Artificial Intelligence (AI) and Data Science are fueling the technological evolution across all industries in the present times. Companies are relying heavily now on intelligent systems and data-capable insights to innovate, optimize processes, and inform decisions. As the competition for employment opportunities in AI and Data Science grows, employers expect a balance of both conceptual understanding and practical application. And for this purpose only we have designed the AI and Data Science quiz.

The AI and Data Science quiz has been created to support learners in developing their technical foundation in various facets of these domains. Besides, you will also be able to assess analytical reasoning skills, and prepare for interviews or certification programs. It includes important topics of study comprising Machine Learning, Deep Learning, Natural Language Processing, Model Evaluation, Data Preparation, Generative AI, MLOps, LLMs, and much more. This quiz delivers a balance of theoretical understanding with practical relevance.

So, if you are a student, practitioner in the field of Data Science and AI, an inquisitive learner, this quiz will certainly help you build confidence while handling the business problems, attempt case studies, improve problem solving abilities, and help you distinguish yourself in tech interviews where critical thinking and applied AI capabilities are differentiators.

1. Which metric is most appropriate to evaluate a binary classifier on an imbalanced dataset?

  • Accuracy
  • Precision-Recall AUC
  • Mean Squared Error
  • R2

2. In supervised learning, what does label leakage mean?

  • Model overfit to the training features
  • Target information is accidentally included as a feature
  • Data pipeline hasn’t been versioned
  • Test set labels are hidden

3. Which tool or skills is important for moving a model from prototype to production?

  • Tableau
  • MLOps (CI/CD monitoring)
  • Excel pivot tables
  • Manual batch scripts

4. Which programming language is most widely required for data science and ML roles?

  • JavaScript
  • Python
  • R only
  • PHP

5. What is “score matching” used for in modern generative models?

  • Training the model to learn a denoising score function that approximates the gradient of the data distribution’s log probability
  • Only matching model outputs to labels exactly
  • Applying mean-squared error loss between random vectors
  • Training solely with adversarial loss without score estimation

6. A resume lists “experience with LLMs.” What should hiring managers expect?

  • Candidate can train LLMs from scratch
  • Understands prompt design and responsible AI usage.
  • Expert in all NLP tasks
  • Can deploy GPUs in a datacenter

7. Which dataset split strategy best simulates future, real-world model performance?

  • Random split (train/test)
  • Stratified sampling only
  • Time-based split when data is temporal
  • Using all data for training

8. What complementary domain knowledge is most valuable for data scientists in healthcare?

  • Finance accounting
  • Clinical knowledge and regulatory understanding
  • Video game design
  • Culinary arts

9. What does feature engineering involve?

  • Creating meaningful model inputs from raw data
  • Building microservices
  • Tuning network hyperparameters
  • Designing UX dashboards

10. How is AI most likely to affect jobs according to recent studies?

  • Replace 50% of jobs immediately
  • Augment roles through hybrid tasks; retraining is common
  • Affect only tech jobs
  • No workplace effect

11. What is the main difference between bagging and boosting in ensemble learning?

  • Bagging sequentially trains models, boosting trains independently
  • Bagging trains models independently on random data subsets; boosting trains models sequentially focusing on previous errors
  • Bagging only works for deep learning, boosting only for linear models
  • Bagging reduces bias, boosting reduces variance

12. To maximize short-term salary growth in 2025, focus on:

  • Historical statistics
  • Building AI/ML and MLOps skills currently in demand
  • Non-technical hobbies
  • Typing faster

13. What is “prompt engineering” in the context of LLMs?

  • Designing datasets
  • Crafting model inputs strategically to shape output behavior
  • Training models from scratch
  • Building user interfaces

14. Which technique aligns AI model outputs with human preferences post-training?

  • Batch normalization
  • Reinforcement Learning from Human Feedback (RLHF)
  • Dropout
  • K-fold cross-validation

15. What is “few-shot learning”?

  • Training with huge datasets
  • Adapting with very few labeled examples
  • Learning with no labels
  • Transferring features across modalities

16. Transformers are best known for:

  • Only image recognition
  • Handling sequence data via attention mechanism
  • Replacing CNNs
  • Having few parameters

17. Major concern with generative AI in production:

  • GPU memory only
  • Hallucinations — generating false but plausible content
  • Low output diversity
  • Always deterministic outputs

18. What is “fine-tuning” in pre-trained models?

  • Training from scratch
  • Adjusting weights on a specific downstream dataset
  • Only adjusting learning rates
  • Freezing layers permanently

19. What is “zero-shot learning”?

  • Performing tasks unseen during training
  • Training with zero data
  • Using zero GPU
  • Only reinforcement learning

20. How can bias in ML models be reduced?

  • Always enlarge the dataset
  • Evaluate outcomes across subgroups and re-balance
  • Ignore demographic data
  • Remove feature engineering

21. Best metric for multi-class classification with imbalance:

  • Accuracy only
  • Macro vs. micro-averaged F1 scores
  • R2
  • Mean Absolute Error (MAE)

22. What does “model distillation” mean?

  • Extracting water
  • Manual overfitting
  • Data augmentation
  • Training a smaller ‘student’ model to mimic a larger ‘teacher’ model

23. What is “self-supervised learning”?

  • Using labeled data only
  • Pre-training on tasks where data creates its own labels
  • Supervised with small labels
  • Model supervising itself

24. Role of attention in transformers:

  • Skip layers
  • Weigh different inputs dynamically
  • Reduce model size
  • Hardware optimization

25. What is “chain-of-thought prompting”?

  • Prompting the model to reason step-by-step logically
  • Sequential predictions only
  • Chained prompts
  • For images only

26. What is “multimodal AI”?

  • CPUs + GPUs
  • Models that handle multiple data types (text, image, audio)
  • Only image models
  • A network of small models

27. Why is Explainable AI or XAI important?

  • To reduce performance
  • Required for trust, fairness, and regulation
  • Always increase accuracy
  • Only academic interest

28. What is “differential privacy”?

  • Encrypt parameters
  • Ensures individuals can’t be re-identified from outputs
  • Reduces model size
  • Only public data use

29. Tool for versioning datasets and models:

  • WordPress
  • Excel
  • Git + DVC or MLflow
  • Manual naming

30. What is “data drift”?

  • Missing documentation
  • Changes in data distribution over time
  • Perfect accuracy
  • Feature order errors

31. “Concept drift” refers to:

  • Model architecture changes
  • Target variable definition/distribution changes
  • Static features
  • Perfect labels

32. What is “edge AI”?

  • Cloud-only
  • AI on-device at the network edge (IoT, phones)
  • For video games
  • For robotics only

33. What’s “federated learning”?

  • Model aggregation without sharing raw data
  • Central server training
  • Poor quality datasets
  • Clustering

34. What are “adversarial attacks”?

  • Friendly competition
  • Inputs crafted to fool models
  • Normal noise
  • Underfitting

35. What is “data whitening”?

  • Removing colors
  • Transforming features to zero mean & unit variance
  • Normalize magnitudes
  • Only for audio

36. NLP deployment trend:

  • Rule-based systems
  • Retrieval-Augmented Generation (RAG) with LLMs
  • Static bag-of-words
  • Avoid embeddings

37. What is “RAG”?

  • Generating data
  • Retrieving external info to ground generation
  • Augmenting data
  • Regularization

38. Transformer scaling low:

  • Relates model/data size and performance gains
  • Only layer count
  • Hardware only
  • Shrinking models

39. Tokenization innovation:

  • Word-by-word only
  • Byte-Pair Encoding (BPE), SentencePiece
  • Raw ASCII
  • Remove tokenization

40. Vision Transformers (ViT)

  • CNNs with layers
  • Apply transformer design to image patches
  • Object detection only
  • Outdated models

41. What is “prompt injection”?

  • Malicious input that alters AI behavior
  • Missing prompts
  • Only image models
  • Hardware bug

42. What is “meta-learning”?

  • One-task learning
  • Learning to adapt quickly to new tasks
  • Model ensembles
  • Same as RL

43. What are “foundation models”?

  • Task-specific models
  • Large pre-trained adaptable models
  • Built from scratch
  • Vision only

44. “Scale-to-zero” in deployment means:

  • Resources scale down to zero when idle
  • Zero compute ways
  • Permanent shutdown
  • Misnomer

45. Benchmarking in ML:

  • Paper comparisons only
  • Systematic performance testing
  • Models without data
  • Irrelevant

46. What is “synthetic data”?

  • Noisy data
  • Artificially generated to supplement real data
  • Unlabeled data
  • Always poor quality

47. What is “data augmentation”?

  • Reduce dataset size
  • Generate modified data versions
  • Tabular only
  • Cleaning method

48. Lottery Ticket Hypothesis:

  • Random training
  • Small subnetworks can match full model performance
  • Synthetic data
  • Scheduling

49. Bias-variance trade-off:

  • Overfitting only
  • Balance complexity and generalization
  • Simplify model
  • Linear only

50. Transfer Learning:

  • Adapt a model trained on one task to another
  • Transfer hardware
  • Small dataset reuse
  • Equal classes

51. Regularization in ML:

  • High learning rates
  • Prevent overfitting (L1, L2, dropout)
  • Reduce parameters only
  • Deep learning only

52. Attention heatmaps help:

  • Show unused inputs
  • Visualize influential inputs for interpretability
  • Only images
  • Explain model size

53. Ensuring AI ethics in regulated industries involves:

  • Ignoring transparency
  • Privacy checks, fairness audits, documentation
  • Only performance metrics
  • Last-minute checks

54. What is “AI Hallucination”?

  • Irrelevant factual content
  • Plausible but false model outputs
  • Refusal to answer
  • 100% accuracy

55. Framework for tracking ML experiments:

  • TensorFlow only
  • MLflow, Weights & Biases
  • Spreadsheets
  • Word docs

56. Hyperparameter tuning:

  • Auto architecture design
  • Optimizing non-learned parameters
  • Labeling datasets
  • Preprocessing only

57. Open-source LLM trend:

  • All proprietary
  • High-quality open weight models for fine-tuning
  • Only small ones
  • Not useful

58. Foundation model risks include:

  • Bias, misuse, privacy, energy cost
  • Training cost only
  • Always fair
  • Hardware failure

59. Multi-task learning:

  • Training a model for multiple related tasks
  • Multiple datasets
  • Sequential tasks only
  • RL only

60. Token efficiency significance:

  • Tokens irrelevant
  • Compute cost per token critical to performance
  • Training only
  • Translation only

61. Peak GPU/TPU demand means:

  • Constant demand
  • Academic use only
  • Planning for compute usage spikes
  • Irrelevant in cloud

62. What is “Responsible AI”?

  • Avoiding bad PR
  • Legal compliance only
  • Transparent, fair, safe, privacy-preserving systems aligned with human values
  • Ethics papers only

63. What approach helps a generative AI model avoid make up facts (i.e. hallucinating)?

  • Reinforcement learning guided by human feedback with grounding checks on external knowledge
  • Only increasing dataset size without context grounding
  • Freezing model weights permanently after first pre-train
  • Reinforcement learning guided by human feedback combined with retrieval or grounding in verified knowledge sources

64. What is the principle behind “diffusion models” in generative modeling?

  • They learn to iteratively denoise random noise through a forward-reverse diffusion process to generate coherent samples
  • Randomly initializing weights without noise schedules
  • Always training in one pass without steps
  • Using deterministic functions with no stochastic process

65. What does “prompt chaining” achieve in GenAI applications?

  • It connects multiple prompts where the output of one serves as input to the next, enabling multi-step reasoning or task completion
  • Only sending a single prompt without structure
  • Concatenating prompts without logical order
  • Running completely independent prompts in parallel

66. What best describes the concept of Agentic AI in current research and industry discussions?

  • AI models that only respond to user queries without taking initiative
  • AI systems that can autonomously plan, reason, take action, and adapt to achieve complex goals with minimal human intervention
  • Traditional supervised models trained purely on labeled datasets
  • Chatbots that follow static, rule-based decision trees

67. What component allows Agentic AI systems to maintain context across multiple interactions?

  • A persistent memory architecture that stores, retrieves, and updates contextual knowledge throughout multi-step reasoning
  • Pretrained embeddings fixed after initial training
  • Static prompts repeated for every query
  • Random sampling of unrelated responses for variety

68. What does a neuron in a neural network primarily do?

  • Transmits data between servers
  • Applies a weighted sum and activation function to inputs to generate an output
  • Stores model parameters permanently
  • Generates random outputs for learning diversity

69. What is the purpose of activation function in a neural network?

  • To store training data
  • To introduce non-linearity and allow the network to learn complex patterns
  • To reduce gradient updates
  • To increase model size for large datasets

70. Which optimizer is commonly used to improve convergence in deep learning models?

Read More